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Introduction 


Prostate  cancer  is  one  of  the  leading  causes  of  cancer-related  deaths  in  the  United  States  among 
men  and  the  most  commonly  diagnosed  cancer  in  American  males  [6],  Most  prostate  cancer- 
related  deaths  are  due  to  advanced  disease.  Identifying  novel  biomarkers  for  advanced  prostate 
cancer  and  uncovering  the  molecular  mechanism  of  disease  progression  will  significantly  benefit 
prostate  cancer  patients.  Our  previous  studies  defined  SChLAPl  (Second  Chromosome  Locus 
Associated  with  Prostate- 1,  SChLAPl )  as  a  prognostic  biomarker  of  advanced  prostate  cancer,  and 
also  showed  that  SChLAPl  promotes  prostate  cancer  migration  and  invasion  by  interacting  with 
SWI/SNF  complex.  We  hypothesize  that  elucidating  the  mechanism  of  SChLAPl -mediated 
abrogation  of  SWI/SNF  function  in  prostate  cancer  progression  will  extend  our  knowledge  of 
prostate  cancer  biology,  and  more  importantly,  reveal  novel  therapeutic  targets  against 
SChLAPl  and/or  its  downstream  factors.  To  test  our  hypothesis,  we  propose  the  following 
specific  aims: 

1)  To  investigate  the  precise  interaction  between  SChLAPl  and  SWI/SNF  complex 
components. 

2)  To  study  the  effects  of  SChLAPl  on  genome- wide  nucleosome  occupancy  and 
H3K27Me3  in  metastatic  prostate  cancer. 

3)  To  determine  the  potential  clinical  utility  of  targeting  SChLAPl,  SChLAPl -SWI/SNF 
interaction  and/or  downstream  effectors  as  a  treatment  modality  in  metastatic 
prostate  cancer. 

1.  KEYWORDS 

LincRNA,  SChLAPl,  SWI/SNF,  EZH2,  ASO 

2.  ACCOMPLISHMENTS 

What  were  the  major  goals  of  the  project? 

Following  three  aims  were  proposed. 
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Specific  Aim-1:  To  investigate  the  precise  interaction  between  SChLAPl  and  SWI/SNF  complex 
components. 

Specific  Aim-2:  To  examine  the  effects  of  SChLAPl  on  genome-wide  nucleosome  occupancy  and 
H3K27Me3  in  metastatic  prostate  cancer. 

Specific  Aim-3:  To  determine  the  potential  clinical  utility  of  targeting  SChLAPl,  SChLAPl  - 
SWI/SNF  interaction  and  downstream  histone  modification  as  a  treatment  modality  in  metastatic 
prostate  cancer. 
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What  was  accomplished  under  these  goals? 

Various  tasks  proposed  in  individual  specific  aims  are  written  below.  Results  obtained  under 
each  task  is  shown  in  form  of  figures. 

Specific  Aim-1:  To  investigate  the  precise  interaction  between  SChLAPl  and  SWI/SNF 
complex  components. 

Task  1:  To  Profile  the  potential  protein  binding  (protecting)  sites  of  SChLAPl  by  PIP-seq 

We  performed  the  PIP-seq  in  LNCaP  cells  to  unbiasedly  characterize  the  protein  binding  sites  of 
SChLAPl  transcripts.  As  shown  in  Figure  1,  crossing  entire  SChLAPl  transcript,  we  profiled 
multiple  protected  sites  based  on  PIP-seq.  Among  those  protested  sites,  we  found  Exon  5  of 
SChLAPl  is  one  of  the  most  highly  protected  exonic  region,  which  suggested  that  Exon  5  of 
SChLAPl  might  be  the  critical  protein  binding  region.  Interestingly,  we  also  found  one  highly 
protected  intronic  region,  which  suggest  the  nascent  SChLAPl  RNA  transcript  might  also  be 


bound  by  protein. 
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Figure  1.  Genome  browser  view  of  PIP-seq  result  of  SChLAPl  transcripts.  Intronic  and  exonic 
protected  regions  were  highlighted  by  red. 


Task  2:  To  validate  the  potential  protein  binding  (protecting)  sites  of  SChLAPl  in  multiple 
prostate  cancer  cells  based  on  PIP-seq 

We  employed  qPCR  in  LNCaP  and  VCaP  cells  to  validate  the  potential  intronic  and  protein 
binding  (protecting)  sites  of  SChLAPl.  RNase  I  specifically  digests  single  strand  RNA,  and  RNase 
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Ill  specifically  digests  double-strand  RNA  degradation.  As  shown  in  Figure  2A,  for  Exon  5  region, 
formaldehyde-fixed  LNCaP  and  VCaP  RNA  could  only  be  digested  by  RNase  I  but  not  RNase  III, 
suggesting  protected  Exon  5  is  not  due  to  double-strand  RNA  formation,  but  might  due  to  protein- 
RNA  interaction.  Similar  to  the  Exon  5  region,  the  protected  intronic  region  of  SChLAPl  also  was 
confirmed  by  qPCR  (Figure  2B). 


A  B 

SChLAPl  Exonic  Site  SChLAPl  Intronic  Site 


Figure  2.  RNA  of  LNCaP  and  VCaP  was  treated  by  RNasel  and  Rnase  III.  Levels  of  SChLAPl 
exonic  and  intronic  protected  sites  was  quantified  by  qRT-PCR. 


Task  3:  To  precisely  characterize  the  subcellular  localization  of  SChLAPl  in  prostate  cancer 
cells 

To  precisely  characterize  the  subcellular  localization  of  SChLAPl  in  prostate  cancer,  we 
performed  single  molecular  Fluorescence  in  situ  hybridization  (smFISH)  for  SChLAPl  in  LNCaP 
cells.  Comparing  to  conventional  Fluorescence  in  situ  hybridization  (FISH),  the  smFISH  assay  is 
an  absolute  quantitative  manner  for  RNA  molecules  at  singe  cell  level.  As  shown  in  Figure  3A, 
we  harnessed  SChLAPl  exon  specific  smFISH  probes  (Red)  and  intron  specific  smFISH  probes 
(Green)  to  SChLAPl  smFISH  assay,  and  found  SChLAPl  is  highly  nuclear  specific  transcript.  By 
quantifying  the  absolute  counts  of  each  SChLAPl  molecules,  we  ascertained  that  absolute 
expression  of  SChLAPl  in  LNCaP  cells  (Figure  3B). 
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Figure  3  Localization  of  SChLAPl  by  smFISH.  A,  smFISH  for  endogenous  SChLAPl  (E-exons, 
I-introns)  in  LNCap  Cell.  B,  Quantification  of  SChLAPl  expression.  C  and  D,  smFISH  for 
endogenous  SChLAPl  (E-exons,  I-introns)  in  tissue  (metastatic  site).  Percentage  depicts  fraction 
of  samples  with  <6  spots  (C)  or  >  6  spots  (D)  per  cells. 


Task  4:  To  validate  the  subcellular  localization  of  SChLAPl  in  prostate  cancer  patient’s 
biopsies 

we  further  employed  smFISH  approach  to  access  the  subcellular  localization  of  SChLAPl  in 
metastatic  prostate  cancer  biopsies.  As  shown  in  Figure  3C,  we  found  the  subcellular  localization 
pattern  of  SChLAPl  in  prostate  cancer  biopsies  is  consistent  in  both  prostate  cancer  cell  lines. 


5 


Task  5:  To  determine  the  co-localization  of  SChLAPl  and  SWI/SNF  complex  in  situ  by 
immunofluorescence-FISH. 

After  we  determined  the  subcellular  localization  of  SChLAPl  in  prostate  cancer  cell  line  and  tissue, 
we  assessed  the  co-localization  of  SChLAPl  with  BRG  or  BRM  in  LNCaP  cells.  As  shown  in 
Figure  4,  we  successfully  performed  the  immunofluorescence-FISH  for  SChLAPl  (Red)  and 
BRG/BRM(Green).  Since  the  BRG  and  BRM  are  abundant  in  LNCaP  cells,  it’s  difficult  to  claim 
that  SChLAPl  is  co-localized  with  SWI/SNF  complex. 


Figure  4:  Co-localization  of  SChLAPl  and  BRG  or  BRM  complex  in  LNCaP  cells  by 
immunofluorescence-FISH.  A,  Co-localization  of  SChLAPl  and  BRG  in  LNCaP  cells.  B,  Co¬ 
localization  of  SChLAPl  and  BRM  in  LNCaP  cells. 
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Specific  Aim-2:  To  examine  the  effects  of  SChLAPl  on  genome-wide  nucleosome  occupancy 
and  H3K27Me3  in  metastatic  prostate  cancer. 


A 

schlapOE-alltss 


—  LacZ 

—  isol 

—  iso2 


Figure  5.  TSS-aligned  overlay  of  nucleosome  occupancy  of  RWPE-LacZ,  RWPE-SChLAPl 
isol,  and  RWPE-SChLAPl  iso2  cells.  Genome-wide  nucleosome  occupancy  was  determined  by 
MNase-seq.  A,  TSS-aligned  overlay  of  nucleosome  occupancy  of  all  TSS.  B,  TSS-aligned 
overlay  of  nucleosome  occupancy  of  TSS  (FPKM  >=1).  C,  TSS-aligned  overlay  of  nucleosome 
occupancy  of  TSS  (FPKM  >=5).  D,  TSS-aligned  overlay  of  nucleosome  occupancy  of  TSS 
(FPKM  >=10). 


Task  1:  To  profile  the  SChLAPl  overexpression-mediated  nucleosome  occupancy  changes 
by  MNase-seq  analysis 

To  profile  the  SChLAPl -mediated  nucleosome  positions  changes  of  surrounding  transcriptional 
start  sites  (TSS),  we  stably  overexpressed  two  SChLAPl  isoform  in  RWPE  cells,  and  performed 
MNase-seq  in  the  SChLAPl  overexpressing  cells  and  parental  cells.  As  shown  in  Figure  5A,  the 
nucleosome  positioning  of  all  TSS  were  not  significantly  changed  by  SChLAPl.  But  for  the  TSS 
of  highly  expressed  transcripts  contained  a  nucleosome  positioning  changes  at  downstream  of 
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TSS  (Figure  5B,  5C  and  5D). 


Task  2:  Integrative  analysis  of  SChLAPl  overexpression-mediated  nucleosome  and 
transcriptomic  changes. 

To  further  functionally  annotate  the  SChLAPl -mediated  nucleosome  position  changes,  all  the 
genes  were  divided  into  six  different  groups  based  their  SChLAPl -mediated  TSS  nucleosome 
positioning  patterns  (Figure  6A  and  6B).  For  genes  belong  to  group  1,  2  and  3,  the  TSS 
contained  a  nucleosome-depleted  region  (NDR)  at  downstream;  for  genes  belong  to  group  4,  5 
and  6,  the  TSS  contained  nucleosome-gain  region  (NGR)  at  downstream.  Functional  annotation 
analysis  was  performed  for  genes  containing  NDR  or  NGR,  and  the  most  significant  relevant 
signature  was  showed  (Figure  6C). 
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Figure  6.  Nucleosome  density  is  decreased  upon  SchLAPl  overexpression.  A,  K-means  cluster 
of  nucleosome  signal  (MNase-seq)  in  cells  expressed  LacZ,  SchLAP  isoforml  and  SchLAP 
isoform2.  Each  row  represents  +  lkb  around  the  TSS  of  expressed  genes  (FPKM  >1).  B.  The 
mean  profiles  of  nucleosome  signal  for  each  subgroup.  C,  Gene  ontology  analysis  of  SChLAPl 
regulated  genes. 
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Task  3:  To  examine  the  effects  of  SChLAPl  on  H3K27Me3  levels  and  genome-wide  binding 
sites  of  the  PRC2  subunits  in  SChLAPl -expressing  prostate  cells. 

In  progress 

Specific  Aim-3:  To  determine  the  potential  clinical  utility  of  targeting  SChLAPl,  SChLAPl  - 
SWI/SNF  interaction  and  downstream  histone  modification  as  a  treatment  modality  in 
metastatic  prostate  cancer. 

Task  1:  To  validate  the  in  vitro  knockdown  efficiency  of  SChLAPl  ASO  in  multiple  prostate 
cancer  cell  lines 

Based  on  our  preliminary  results,  we  examined  the  knockdown  efficiency  of  SChLAPl  ASOs  in 
22RV1,  LNCaP  and  C4-2B  cells  via  two  different  ASO  delivering  manner  (Figure  7). 


Figure  7.  Validation  of  SChLAPl  ASO  knockdown  efficiency  in  multiple  prostate  cancer  cell 
lines.  A,  SChLAPl  ASOs  were  delivered  via  transfection  (20  nM)  into  LNCaP  and  22RV1  cells, 
levels  of  SChLAPl  were  quantified  by  qRT-PCR.  B,  SChLAPl  ASOs  were  delivered  via  free 
uptake  (2.5  pM)  into  LNCaP  and  22RV1  cells,  levels  of  SChLAPl  were  quantified  by  qRT-PCR. 
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Task  2:  To  investigate  the  antitumor  effect  of  SChLAPl  ASOs  in  multiple  prostate  cancer 
cell  lines 

In  LNCaP,  C4-2B  and  22RV 1  cell,  we  performed  in  vitro  proliferation  and  invasion  assays  on 
SChLAPl  knockdown  cells.  As  shown  below,  knockdown  of  SChLAPl  didn’t  significantly  inhibit 
cell  proliferation  (Figure  8),  but  slightly  decrease  invasion  in  LNCaP  and  22RV 1  cells  (Figure  9). 


Figure  8.  Proliferation  of  LNCaP,  C4-2B  and  22RV1  cells  after  SChLAPl  knockdown  by  ASOs. 


Figure  9.  Invasiveness  of  LNCaP,  C4-2B  and  22RV1  cells  after  SChLAPl  knockdown  by  ASOs. 

Task  3:  To  determine  BRM  as  a  critical  synthetic  lethal  target  in  SChLAPl-epxressing 
prostate  cancer 
In  progress. 
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Task  4:  To  determine  the  sensitivity  of  EZH2  inhibitor  endowed  by  SChLAPl 

We  tested  the  growth  inhibitory  effect  of  EZH2  inhibitor  (EPZ-6438)  in  SChLAPl  expressing 
prostate  cancer  cell  lines  (LNCaP  and  22RV1),  and  SChLAPl  non-expressing  prostate  cancer  cell 
line  (PC3).  Furthermore,  IC50  of  EPZ-6438  was  examined  in  multiple  SChLAPl -expressing  and 
SChLAPl  non-expressing  prostate  cancer  cell  lines,  we  found  EPZ-6438  is  more  sensitive  in 
SChLAPl  expressing  prostate  cancer  cells. 
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Figure  10.  The  anti-tumor  effect  of  EPZ-6438  in  multiple  prostate  cancer  cell  lines.  A,  The  growth 
inhibitory  effect  of  EPZ-6438  in  LNCaP,  22RV1  and  PC3.  B,  IC50  plot  of  EPZ-6438  in  multiple  prostate 
cancer  cell  lines.  C,  Effect  of  EPZ-6438  on  H3K27Me3  in  LNCaP,  22RV1  and  PC3. 


Task  5:  To  determine  the  tumor  growth  inhibitory  effect  of  EPZ-6438  in  C4-2B  AND  22RV1 
xenografts. 

In  C4-2B  xenografts,  mice  dosed  orally  with  Vehicle  (0.5%  CMC),  Enzalutamide  (30  mg/kg)  or  EPZ-6438 
(250  mg/kg,  QD  or  BID),  respectively.  Tumor  volume,  weight  and  animal  body  weight  were  monitored, 
and  western  blot  was  performed  to  check  the  effect  of  EPZ-6438  on  H3K27Me3.  As  shown  below,  we 
found  that  EPZ-6438  efficiently  decreased  H3K27Me3  levels  in  C4-2B  xenografts  (Figure  11D),  but  the 
tumor  volume  and  weight  was  not  significantly  affected  by  EPZ-6438  (Figure  11A  andllB).  In  addition, 


li 


we  tested  EPZ-6438  in  22RV1  xenografts  at  higher  dose  (500  mg/kg).  Like  C4-2B  xenografts,  EPZ-6438 
did  not  significantly  inhibit  22RV1  xenografts  growth  in  vivo  (Figure  HE,  11F  and  11G). 
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Figure  11.  Tumor  growth  inhibitory  effect  of  EPZ-6438  in  vivo.  A,  B  and  C,  Comparison  of  the  effect  of 
Vehicle,  Enzalutamide  (30  mg/kg)  and  EPZ-6438  (250  mg/kg,  QD  or  BID)  on  tumor  volume  (A),  tumor 
weight  (B)  and  animal  body  weight  changes  (C)  in  C4-2B  xenografts.  D,  The  effect  of  EPZ-6438  on 
H3K27Me3  in  C4-2B  xenografts.  E  and  F,  The  effect  of  Vehicle  and  EPZ-6438  (500  mg/kg)  on  tumor 
volume  (E),  tumor  weight  (F)  in  22RV1  xenografts.  G,  The  effect  of  EPZ-6438  on  H3K27Me3  in  22RV1 
xenografts. 
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What  opportunities  for  training  and  professional  development  has  the  project  provided? 

Several  opportunities  for  training  and  professional  development  were  provided  while  working 
on  this  project. 

One-on-one  meeting  with  mentor:  Data  was  discussed  with  mentor  (Dr.  Arul  Chinnaiyan)  in 
biweekly  individual  meetings  and  via  monthly  progress  reports.  Constant  inputs  and  advice 
were  provided. 

Presentations  at  Scientific  Conference:  Data  obtained  was  presented  at  scientific  conferences: 
1.  Annual  AACR  conference  (April,  2017) 

Manuscript  writing: 

Weekly  lab  meetings:  All  important  research  findings  were  presented  during  weekly  lab 
meetings  in  front  of  the  group. 

How  were  the  results  disseminated  to  communities  of  interest? 

Results  from  the  grant  were  published  in  form  on  a  AACR  poster. 

What  do  you  plan  to  do  during  the  next  reporting  period  to  accomplish  the  goals? 

By  next  reporting  period  I  anticipate  to  accomplish  all  the  aims  proposed  in  the  study.  We  also 
anticipate  to  publish  manuscript  describing  results  of  co-targeting  AR  and  EZH2  in  SChLAPl  - 
expressing  prostate  cancer  cells. 

3.  IMPACT: 

What  was  the  impact  on  the  development  of  the  principal  discipline(s)  of  the  project? 

Once  completed,  the  proposed  project  “Biological  characterization  and  clinical  utilization  of 
metastatic  prostate  cancer-associated  lincRNA  SChLAPl ”  will  have  substantial  impact  on 
understanding  the  biological  function  of  SChLAPl  in  prostate  cancer  progression.  We  have 
proposed  to  study  the  role  of  SChLAPl  in  development  and  progression  of  prostate  cancer  as 
well  as  explore  the  potential  of  IncRNA  to  serve  as  biomarkers  and  therapeutic  target  for 
metastatic  prostate  cancer. 


13 


Till  data  we  have  discovered  the  potential  protein  binding  sites  of  SChLAPl  and  determine  the 
specific  localization  pattern  of  SChLAPl  in  prostate  cancer  cells  and  biopsies.  We  also  showed 
that  by  interacting  with  SWI/SNF  complex,  SChLAPl  mediates  significantly  nucleosome 
occupancy  changes  in  prostate  cells.  More  importantly,  we  found  tested  the  translational  value  of 
SChLAPl  as  a  biomarker  and  therapeutic  targets  for  prostate  cancer. 

We  anticipate  that  at  the  end  of  this  proposed  project,  it  will  advance  our  knowledge  on  the  critical 
role  of  lincRNA  in  metastatic  prostate  cancer,  and  offer  potential  translation  opportunities  for 
future  progressive  prostate  cancer  treatment. 

What  was  the  impact  on  other  disciplines? 

Nothing  to  report 

What  was  the  impact  on  technology  transfer? 

Nothing  to  report 

What  was  the  impact  on  society  beyond  science  and  technology? 

Nothing  to  report 
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4  CHANGES/PROBLEMS: 

Changes  in  approach  and  reasons  for  change 

We  proposed  to  systematically  assess  the  antitumor  effect  of  SChLAPl  ASO  in  vitro  and  in  vivo , 
but  we  didn’t  observe  significant  effect  of  SChLAPl  on  proliferation  and  invasion  in  vitro.  Thus, 
without  strong  in  vitro  evidence,  we  didn’t  further  test  the  anti-tumor  effect  of  SChLAPl  ASOs 
in  vivo. 

Actual  or  anticipated  problems  or  delays  and  actions  or  plans  to  resolve  them 

Since  the  SChLAPl  ASOs  didn’t  exhibit  strong  anti -tumor  effect  in  in  vitro,  we  started  to 
employ  CRIPSR  technique  to  determine  the  oncogenic  function  of  SChLAPl  in  LNCaP  cells.  As 
shown  in  Figure  12,  we  generated  three  monoclonal  heterozygous  SChLAPl  deletion  lines 
(No.6,  No.l  1  and  No.  15).  Copy  number  of  SChLAPl  in  each  colon  is  determined  by  smFISH 
(Figure  12A),  and  we  found  that  the  proliferation  of  all  SChLAPl  loss  clones  are  significant 
slower  than  WT  clones.  Next,  we  plan  to  profile  the  transcriptomic  changes  in  SChLAPl  loss 
clones  and  WT  clones  by  RNA-seq. 


A 


B 


#  Nuclear  Foci  /  cell 

O  £»■  OO 

■ _ . _ i _ . _ ■ 

ii1 
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TT 

No.6  (+/-) 
No.  11  (+/-) 
No. 15  (+/-) 
No. 17  (WT) 
No. 21  (WT) 
No.  22  (WT) 


Figure  12.  Loss-of-function  of  SChaLAPl  in  LNCaP  by  CRIPSR  technique.  A,  The  copy 
number  of  SChLAPl  loss  clones  and  WT  clones  was  quantified  by  smFISH  assay.  B,  The 
proliferation  of  SChLAPl  loss  clones  and  WT  clones  was  examined  by  Cell -titer  Glo  assay. 


Changes  that  had  a  significant  impact  on  expenditures 

We  performed  cell  proliferation  and  invasion  assay  to  determine  the  antitumor  effect  of 
SChLAPl  ASO.  We  found  that  even  through  the  knockdown  efficiency  of  SChLAPl  ASOs  were 
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similar  to  SChLAPl  siRNA  or  shRNA,  but  SChLAPl  ASOs  didn’t  phenocopy  the  effect  of 
SChLAPl  siRNA  or  shRNA  on  proliferation  and  invasion.  We  also  proposed  that  SChLAPl  - 
expressing  prostate  cancer  cells  might  sensitive  to  EZH2  inhibitor  (EPZ-6438),  but  our  in  vitro 
and  in  vivo  data  suggested  that  EPZ-6438  along  didn’t  effectively  inhibited  the  tumor  growth. 
Interestingly,  we  found  that  SChLAPl -expressing  prostate  cancer  are  sensitive  to  the 
combination  of  EPZ-6438  and  Enzalutamide,  thus  we  will  further  test  the  co-targeting  EZH2  and 
AR  strategy  in  SChLAPl -expressing  prostate  cancer  cells  in  vitro  and  in  vivo. 

Significant  changes  in  use  or  care  of  human  subjects,  vertebrate  animals,  biohazards, 
and/or  select  agents 

No  Changes 

Significant  changes  in  use  or  care  of  human  subjects: 

None 

Significant  changes  in  use  or  care  of  vertebrate  animals. 

None 

Significant  changes  in  use  of  biohazards  and/or  select  agents. 

None 


5.  PRODUCTS: 

Publications,  conference  papers,  and  presentations 
Journal  publications: 

Shukla  S*,  Zhang  X*,  Niknafs  YS*,  Xiao  L*,  Mehra  R,  Cieslik  M,  Ross  A,  Schaeffer  E,  Malik 
B,  Guo  S,  Freier  SM,  Bui  HH,  Siddiqui  J,  Jing  X,  Cao  X,  Dhanasekaran  SM,  Feng  FY, 
Chinnaiyan  AM,  Malik  R.  Identification  and  Validation  of  PCAT14  as  Prognostic  Biomarker  in 
Prostate  Cancer.  Neoplasia.  2016  Aug;18(8):489-99. 

Niknafs  YS,  Han  S,  Ma  T,  Speers  C,  Zhang  C,  Wilder-Romans  K,  Iyer  MK,  Pitchiaya  S,  Malik 
R,  Hosono  Y,  Prensner  JR,  Poliakov  A,  Singhal  U,  Xiao  L,  Kregel  S,  Siebenaler  RF,  Zhao 
SG,  Uhl  M,  Gawronski  A,  Hayes  DF,  Pierce  LJ,  Cao  X,  Collins  C,  Backofen  R,  Sahinalp  CS, 
Rae  JM,  Chinnaiyan  AM,  Feng  FY.  The  IncRNA  landscape  of  breast  cancer  reveals  a  role  for 
DSCAM-AS1  in  breast  cancer  progression.  NatCommun.  2016  Sep  26;7:12791. 
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Other  publications,  conference  papers,  and  presentations 

Xiao  L,  Shukla  S,  Zhang  X,  Niknafs  YS,  Malik  R,  Chinnaiyan  AM.  Identification  and 
Validation  of  PCAT14  as  Prognostic  Biomarker  in  Prostate  Cancer,  [abstract].  In: 
Proceedings  of  the  104th  Annual  Meeting  of  the  American  Association  for  Cancer  Research; 
2013  Apr  6-10;  Washington,  DC.  Philadelphia  (PA):  AACR; 


Technologies  or  techniques 

Nothing  to  Report. 

Inventions,  patent  applications,  and/or  licenses 

Nothing  to  Report. 

Other  Products 

Nothing  to  Report. 

6.  PARTICIPANTS  &  OTHER  COLLABORATING  ORGANIZATIONS 


What  individuals  have  worked  on  the  project? 


Name: 


Lanbo  Xiao  -  No  change.  1 1 .4  CM 


Has  there  been  a  change  in  the  active  other  support  of  the  PD/PI(s)  or  senior/key  personnel 
since  the  last  reporting  period? 

None 

7.  SPECIAL  REPORTING  REQUIREMENTS 
None 

8.  APPENDICES:  Attach  all  appendices  that  contain  information  that  supplements, 
clarifies  or  supports  the  text.  Examples  include  original  copies  of  journal  articles, 
reprints  of  manuscripts  and  abstracts,  a  curriculum  vitae,  patent  applications,  study 
questionnaires,  and  surveys,  etc. 

1 .  Shukla  S,  et.  al.,  Neoplasia  201 6 

2.  Niknafs  YS,  et.  al.,  Nature  Communication  2016 

3.  Curriculum  VitaefLanbo  Xiao 
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Abstract 

Rapid  advances  in  the  discovery  of  long  noncoding  RNAs  (IncRNAs)  have  identified  lineage-  and  cancer-specific 
biomarkers  that  may  be  relevant  in  the  clinical  management  of  prostate  cancer  (PCa).  Here  we  assembled  and  analyzed  a 
large  RNA-seq  dataset,  from  585  patient  samples,  including  benign  prostate  tissue  and  both  localized  and  metastatic  PCa 
to  discover  and  validate  differentially  expressed  genes  associated  with  disease  aggressiveness.  We  performed  Sample 
Set  Enrichment  Analysis  (SSEA)  and  identified  genes  associated  with  low  versus  high  Gleason  score  in  the  RNA-seq 
database.  Comparing  Gleason  6  versus  9+  PCa  samples,  we  identified  99  differentially  expressed  genes  with  variable 
association  to  Gleason  grade  as  well  as  robust  expression  in  prostate  cancer.  The  top-ranked  novel  IncRNA  PCAT14, 
exhibits  both  cancer  and  lineage  specificity.  On  multivariate  analysis,  low  PCATJ4  expression  independently  predicts  for 
BPFS  (P  =  .00126),  PSS  (P  =  .0385),  and  MFS  (P  =  .000609),  with  trends  for  OS  as  well  (P  =  .056).  An  RNA  in-situ 
hybridization  (ISH)  assay  for  PCAT14  distinguished  benign  vs  malignant  cases,  as  well  as  high  vs  low  Gleason  disease. 
PCAT14  is  transcriptionally  regulated  by  AR,  and  endogenous  PCAT14  overexpression  suppresses  cell  invasion.  Thus, 
Using  RNA-sequencing  data  we  identify  PCAT14,  a  novel  prostate  cancer  and  lineage-specific  IncRNA.  PCAT14  is  highly 
expressed  in  low  grade  disease  and  loss  of  PCAT14  predicts  for  disease  aggressiveness  and  recurrence. 

Neoplasia  (2016)  18,  489-499 
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Introduction 

Early  detection  of  prostate  cancer,  largely  facilitated  by  the  advent  of 
PSA  screening,  has  also  been  attributed  to  over-diagnosis  and 
overtreatment  of  this  disease  [1-3].  While  coupling  PSA  screening 
with  other  biomarkers  such  as  the  long  non-coding  RNA  (IncRNA) 
transcript  PCA3  or  gene  fusions  events  (such  as  TMPRSS2-ERG) 
have  increased  specificity  of  cancer  diagnosis,  these  biomarkers  have 
limited  utility  in  stratifying  patients  in  terms  of  prognosis  [4,3]. 
While  stratifying  patients  into  risk  groups  based  on  clinicopathologic 
features  is  currently  used  to  guide  treatment  decisions  [6],  it  is  clear 
that  current  stratification  approaches  need  to  be  further  refined  to 
allow  better  personalization  of  therapy.  Thus,  identifying  molecular 
biomarkers  to  distinguish  indolent  versus  aggressive  disease  would 
address  an  unmet  need  in  the  clinical  management  of  prostate  cancer. 

Advances  in  next-generation  sequencing  technologies  have  enabled 
thorough  characterization  of  cancer  transcriptomes,  especially  in 
unraveling  the  realm  of  non-coding  RNAs  (ncRNAs)  [7,8].  In 
particular,  IncRNAs,  a  class  of  ncRNAs,  have  gained  increasing 
attention  as  biomarkers  due  to  their  tissue-  and  cancer-specific 
expression  profile  [9] .  In  this  study,  we  assembled  and  analyzed  a  large 
RNA-seq  compendium  compiled  from  recent  publications  from 
consortiums  such  as  The  Cancer  Genome  Atlas  (TCGA),  the  Prostate 
Cancer  Foundation/Stand  Up  to  Cancer  international  team,  and 
others  to  identify  differentially  expressed  genes  (both  protein  coding 
and  non-coding  genes),  that  are  associated  with  indolent  versus 
aggressive  disease  [10,11].  Our  results  identify  PCAT14,  a  prostate 
cancer-  and  lineage-specific  IncRNA,  as  a  top  differentially  expressed 
gene  in  this  context.  We  characterize  PCAT14  preclinically  and 
demonstrate  that  it  correlates  inversely  in  expression  with  disease 
aggressiveness  and  adds  to  conventional  clinicopathologic  risk  factors 
in  predicting  prognosis  in  prostate  cancer  patients.  Finally,  we 
develop  a  novel  in-situ  hybiridation  (ISH) -based  approach  for 
detecting  PCAT14  in  clinical  samples. 

Material  and  Methods 

RNA-Seq  Data  Set 

Prostate  RNA-seq  cohort  (n  =  585)  containing  52  benign  prostate 
tissues,  501  primary  prostate  cancers,  and  132  metastatic  prostate 
cancers  was  used  in  this  study.  For  nomination  of  Gleason  associated 
genes,  we  compared  low  Gleason  tumors  (Gleason  6,  n  =  45)  to  high 
Gleason  tumors  (Gleason  9+,  n  =  140). 

RNA-seq  Data  Processing 

TCGA  prostate  Fastq  files  were  obtained  from  the  CGhub.  Reads 
were  aligned  using  STAR  version  2.4.2  [12]  and  read  abundance  was 
calculated  using  FeatureCounts  version  1.4.6  [13]. 

RNA-Seq  Differential  Expression  Testing 

Differential  expression  testing  was  performed  using  the  Sample  Set 
Enrichment  Analysis  (SSEA)  tool  described  previously  [7].  Briefly, 
following  count  data  normalization,  SSEA  performs  the  weighted 
KS-test  procedure  described  in  GSEA  [14].  The  resulting  enrichment 
score  (ES)  statistic  describes  the  enrichment  of  the  sample  set  among 
all  samples  being  tested.  To  test  for  significance,  SSEA  enrichment 
tests  are  performed  following  random  shuffling  of  the  sample  labels. 
These  shuffled  enrichment  tests  are  used  to  derive  a  set  of  null 
enrichment  scores  (1000  null  enrichment  scores  computed).  The 
nominal  p  value  reported  is  the  relative  rank  of  the  observed 
enrichment  score  within  the  null  enrichment  scores.  Multiple 


hypothesis  testing  is  performed  by  comparing  the  enrichment  score 
of  the  test  to  the  null  normalized  enrichment  score  (NES) 
distributions  for  all  transcripts  in  a  sample  set.  This  null  NES 
distribution  is  used  to  compute  FDR  q  values  in  the  same  manner 
used  by  GSEA  [14] .  SSEA  percentile  score  determined  by  ranking  the 
genes  in  each  analysis  by  their  NES  score. 

Tissue  Expression  Heatmap  Generation 

The  “gplots”  R-package  was  used  to  generate  heatmaps  using  the 
heatmap.2  function.  Expression  was  normalized  as  log2  of  the 
fold-change  over  the  median  of  the  normal  samples  for  each 
transcript.  Unsupervised  hierarchical  clustering  was  performed  with 
the  hclust  function,  using  Pearson  correlation  as  the  clustering 
distance,  using  the  “ward”  agglomeration  method. 

Identification  of  Genes  Differentially  Expressed  in  Prostate 
Cancer  of  Varying  Gleason  Score 

Differentially  expressed  Gleason  associated  genes  were  identified  as  any 
gene  with  an  SSEA  FDR<  0.01  when  comparing  Gleason  6  primary 
tumors  to  Gleason  9+  primary  tumors.  Filtering  for  expression  levels  in 
tissues  was  done  by  enforcing  that  each  gene  had  >5FPKM  expression  in 
the  top  5%  of  prostate  tumor  samples.  Filtering  for  overexpression  in 
cancers  versus  normal  was  done  by  enforcing  an  SSEA  FDR  of  <0.0001 
in  an  analysis  comparing  the  TCGA  prostate  cancer  vs  normal  tissues. 
Tissue  specificity  percentile  was  determined  as  the  SSEA  percentile  for 
each  gene  in  an  SSEA  analysis  comparing  the  TCGA  prostate  samples  to 
all  other  TCGA  tumors  in  our  multi-tissue  compendium  [7]. 

Clinical  Analysis 

To  assess  the  prognostic  value  of  PCAT14 ,  microarray  data  was 
obtained  from  the  Johns  Fiopkins  University  (JHU)  (N  =  355).  Patients 
were  treated  with  prostatectomy  and  subsequently  received  no  adjuvant 
or  salvage  treatment  until  metastasis.  Microarray  processing  and 
normalization  was  performed  as  described  previously  [15].  PCAT14 
expression  was  calculated  by  taking  the  mean  expression  of  probe  sets 
mapping  to  exons.  High/low  PCA  T14  was  determined  by  splitting  on  the 
median  expression  level.  Kaplan-Meier  curves  are  shown  and  statistical 
inference  was  performed  using  the  Log-rank  test.  Multivariate  analysis 
was  performed  using  Cox  regression.  Age  was  treated  as  a  continuous 
variable.  PSA  was  grouped  into  low  (<10  ng/ml),  intermediate 
(10-20  ng/ml),  and  high  (>20  ng/ml).  Surgical  margin  status  (SMS), 
seminal  vesicle  invasion  (SVI),  extracapsular  extension  (ECE),  and  lymph 
node  invasion  (LNI)  were  treated  as  binary  variables.  Gleason  score  was 
grouped  into  low  (<7)  or  high  (8-10).  Association  of  PCAT14  and 
clinicopathologic  variables  was  evaluated  using  a  t- test  for  continuous 
variables,  and  a  chi-squared  test  for  categorical  variables.  Statistical 
significance  was  set  as  a  two-sided  y>-value  <0.05.  All  analyses  were 
performed  in  R  3.1.2. 

ISH  Analysis 

PCAT14  ISH  was  performed  on  thin  (approximately  4  pm  thick) 
TMA  sections  (Advanced  Cell  Diagnostics,  Inc.,  Hayward,  CA),  as 
described  previously  [16,17];  in  parallel,  PCAT141SW  was  performed 
on  previously  identified  positive  and  negative  control  index 
formalin-fixed  paraffin  embedded  (FFPE)  tissue  sections.  All  slides 
were  examined  for  PCAT14  ISH  signals  in  morphologically  intact 
cells  and  scored  manually  by  a  study  pathologist  (Rohit  Mehra). 
Specific  PCAT14  ISH  signal  was  identified  as  brown,  punctate  dots, 
and  expression  level  was  scored  as  follows:  0  =  no  staining  or  less  than 
1  dot  per  1 0  cells,  1  =  1  to  3  dots  per  cell,  2  =  4  to  9  dots  per  cell  (few 
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or  no  dot  clusters),  3  =  10  to  14  dots  per  cell  (less  than  10%  in  dot  clusters),  and 
4  =  greater  than  13  dots  per  cell  (more  than  10%  in  dot  clusters).  For  each 
evaluable  tissue  core,  a  cumulative  ISH  product  score  was  calculated  as  the  sum  of 
the  individual  products  of  the  expression  level  (0  to  4)  and  percentage  of  cells  (0  to 
100)  (i.e.,  [A %  x  0]  +  [B%  x  1]  +  [C%  x  2]  +  [D%  x  3]  +  [E%  x  4]; 
total  range  =  0  to  400).  For  each  tissue  sample,  the  ISH  product  score  was 
averaged  across  evaluable  TMA  tissue  cores.  All  quantitative  data  were  shown  as 
mean  ±  S.D.  To  obtain  significance  in  the  difference  between  two  groups  was 
performed  by  two-sided  t  test  using  Graph  Pad  Prism  6.02  software. 

Cell  Lines ,  Tissues  and  Reagents 

All  prostate  cell  lines  used  in  this  study  were  purchased  from  the  American 
Type  Culture  Collection  (ATCC),  cultured  according  to  their  recommendations 
and  were  periodically  checked  for  mycoplasma  contamination  and  genotyped  to 
confirm  identity.  For  androgen  treatment  experiments,  VCaP  cells  were 
pre-cultured  in  androgen-free  charcoal-stripped  medium  for  48  hours  and  treated 
with  10  nM  dihydrotestosterone  (DHT)  or  10  pM  MDV3100  or  vehicle 
(ethanol)  for  indicated  time  points  before  cells  were  harvested  for  RNA  isolation. 
For  drug  treatment  experiments,  LNCaP  cells  were  treated  with  the  5-20  pM 
DNA  methylation  inhibitor  5 -aza-2 ' -deoxycytidine  (5-aza)  (catalog: 
A3656-5MG,  Sigma),  or  DMSO  for  5  days.  RNA  was  isolated  24  h  after 
drug  treatment  and  expression  was  analyzed  by  qRT-PCR. 

Prostate  specimens  were  acquired  from  the  patients  who  underwent 
radical  prostatectomy  and  from  the  Rapid  Autopsy  Program  at  the  tissue 
core  of  University  of  Michigan  as  part  of  the  University  of  Michigan 
Prostate  Cancer  Specialized  Program  Of  Research  Excellence 
(S.P.O.R.E.).  Informed  consents  were  obtained  from  each  patient. 

RNA  Isolation  and  qPCR  Analysis 

Total  RNA  was  extracted  using  Trizol  reagent  and  an  RNeasy  Micro  Kit 
(Qiagen)  with  DNase  I  digestion  according  to  the  manufacturer's 
protocols.  RT-PCR  was  performed  from  total  RNA  using  Superscript  III 
(Invitrogen)  with  random  primers  (Invitrogen).  Quantitative  PCR  (qPCR) 
was  performed  using  Fast  SYBR  Green  Master  Mix  (Applied  Biosystems) 
on  a  7900HT  Fast  Real-Time  PCR  system  (Applied  Biosystems).  All 
oligonucleotide  primers  were  purchased  from  Integrated  DNA  Technol¬ 
ogies  (Coralville,  IA)  are  sequence  of  each  primer  is  listed  in  Supplementary 
Table  4.  Primer  specificity  was  determined  by  sequence  verifying  the  PCR 
products  using  the  University  of  Michigan  Sequencing  Core  facility. 

Rapid  Amplification  ofcDNA  Ends  (RACE) 

5'  and  3'  RACE  was  performed  using  the  GeneRacer  RLM-RACE 
kit  (Invitrogen)  according  to  the  manufacturer's  protocols.  RACE 
PCR  products  obtained  using  Platinum  Taq  high-fidelity  polymerase 
(Invitrogen),  were  resolved  on  a  1.5%  agarose  gel.  Individual  bands 
were  gel  purified  using  a  Gel  Extraction  kit  (Qiagen),  and  cloned  into 
PCR4  TOPO  vector,  and  sequenced  using  Ml 3  primers. 

Knock  Down  Studies 

MDA-PCa-2b  and  VCaP  cells  were  seeded  in  biocoated  6-well 
plates  at  60%  confluency,  incubated  overnight,  and  transfected  with 
50  nM  siRNAs  targeting  different  exons  of  PCAT14 or  non-targeting 
siRNAs,  using  RNAi  MAX  reagent  (Invitrogen)  per  manufacturer's 
instructions.  RNA  was  harvested  48  h  after  transfection.  Functional 
experiments  were  performed  at  indicated  time  points.  Sequence  of  all 
the  siRNA  used  in  shown  in  Supplementary  Table  4. 

Nuclear-Cytoplasmic  Sub  cellular  Fractionation 

Nuclear-cytoplasmic  fraction  of  MDA-PCa-2b  and  VCaP  cells  was 
performed  using  an  NE-PER  Nuclear  and  Cytoplasmic  Extraction  kit 


(Thermo  Scientific)  following  manufacturer's  instructions,  followed 
by  RNA  isolation  and  qPCR  analysis. 

CRISPR  Based  Overexpression  of  PCAT  14 

Stable  cell  lines  overexpressing  PCAT14  endogenously  were  made  using 
previously  published  protocol  [18].  Briefly,  guide  RNAs  targeting  promoter 
region  of PCAT14  (Supplementary  Table  4)  were  designed  using  online  tool  at 
http://crispr.mit.edu/  and  cloned  into  sgRNA-MS2  vector  using  lenti 
sgRNA(MS2)  zeo  backbone.  Lentiviral  particles  expressing  PCAT  14 
sgRNA-MS2  were  generated  by  the  University  of  Michigan  vector  core.  To 
generate  LNCaP  or  PC3  cell  over  expressing  PCA  T14,  first  cells  were  seeded 
into  100  mm  dish  and  transduced  with  Lenti  dCAS-VP64  (blasticidin)  and 
Lenti-MS2-p65-HSFl  (hygromycin)  vectors.  After  2  days,  cells  were  selected 
with  4  pg/ml  Blasticidin  and  200  pg/ml  Hygromycin.  Cells  stably  expressing 
dCAS-VP64  and  MS2-p65-HSFl  cells  were  then  seeded  in  6-well  plates  and 
infected  with  PCAT  14  sgRNA-MS2  lentivirus.  24  hours  later,  cells  were 
selected  with  triple  antibiotics:  4  jag/ml  Blasticidin,  200  pg/ml  Hygromycin 
and  800  pg/ml  Zeomycin  for  1  week.  Expression  of  PCAT  14  in  these  cells 
was  verified  by  qPCR 

In  Vitro  FluoroBlok  Tumor  Invasion  Assay 

The  In  vitro  FluoroBlok  Tumor  Invasion  Assay  (BD)  was  performed  as 
previously  described  [19].  Briefly,  after  rehydration  of  the  BD  FluoroBlok 
membrane,  500  ul  of  serum-free  RPMI  medium  resuspended  prostate  cancer 
cells  (PC3,  50,000  cells  per  well,  or  LNCaP,  100,000  cells  per  well)  were 
seeded  into  the  apical  chambers.  750  ul  RPMI  medium  containing  10%  FBS 
were  added  to  the  basal  chamber  as  chemoattractant.  Then  plates  were 
incubated  at  37  °C,  5%  C02  for  24  hours.  Following  incubation,  medium 
from  the  apical  chambers  were  removed,  and  the  inserts  were  transferred  to  a 
24-well  plate  containing  500ul/well  of  4ug/mL  Calcein  AM  (Invitrogen)  in 
Hanks  buffered  saline.  Plates  were  incubated  for  1  hour  at  37  °C,  5%  C02, 
then  pictures  of  invaded  cells  were  taken  by  using  inverted  fluorescence 
microscope  (Olympus),  and  quantified  by  ImageJ  software  [20]. 

Oncomine  Concepts  Analysis  of  the  PCAT  14  Signature 

Gene  that  positively  correlated  (R2  >  0.35,  n  =  591)  with  PCAT14  in 
TCGA  RNA-seq  data  were  selected  and  uploaded  into  Oncomine  database 
[21]  as  custom  concepts  (Supplementary  Table  2).  All  the  prostate  cancer 
concepts  with  odds  ratio>  2.0  and  Rvalue  <1  x  10^  were  selected.  For 
simplicity,  top  4  concepts  (based  on  odds  ratios)  were  selected  for  representation. 
We  exported  these  results  as  the  nodes  and  edges  of  a  concept  association 
network  and  visualized  the  network  using  Cytoscape  version  3.3.0.  Node 
positions  were  computed  using  the  Edge-weighted  force  directed  layout  in 
Cytoscape  using  the  odds  ratio  as  the  edge  weight.  Node  positions  were  subdy 
altered  manually  to  enable  better  visualization  of  node  labels. 

Statistics 

All  quantitative  data  were  shown  as  mean  ±  S.D.  To  obtain 
significance  in  the  difference  between  two  groups  was  performed  by 
two-sided  t  test  or  AN OVA  using  Graph  Pad  Prism  6.02  software. 

Results 

Identification  of  Genes  Associated  With  Gleason  Grade  in 
Prostate  Cancer 

Comprehensive  molecular  characterization  of  common  cancer 
types  has  become  feasible  with  the  recent  availability  of  large  next 
generation  sequencing  datasets  on  tumor  tissues.  To  identify  genes 
(both  coding  and  non-coding)  associated  with  aggressive  prostate 
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cancer,  we  assembled  a  large  prostate  RNA-seq  cohort  (n  =  585) 
containing  52  benign  prostate  tissues,  501  primary  prostate  cancers, 
and  132  metastatic  prostate  cancers.  We  performed  differential 
expression  testing  utilizing  a  non-parametric  tool  we  developed  for 
RNA-seq  data  called  Sample  Set  Enrichment  Analysis  [7] .  In  order  to 


nominate  the  most  intriguing  biomarkers  associated  with  aggressive 
disease,  we  compared  low  Gleason  tumors  (Gleason  6,  n  =  45)  to 
high  Gleason  tumors  (Gleason  9+,  n  =  140)  and  applied  filters  for 
substantial  expression  in  prostate  tumor  tissue  (>5PKM  in  the  top 
5%  of  prostate  samples),  and  significant  differential  expression  in 
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Figure  1 .  Identification  of  IncRNA  PCAT14  as  a  prostate  cancer  biomarker.  A.  Schematic  representation  of  the  workflow  utilized  to  identify 
highly-expressed,  prostate  cancer  specific  genes  associated  with  low-Gleason  disease.  B.  Heatmap  depiction  of  the  IncRNA  and  protein 
coding  genes  differentially  expressed  (n  =  99)  between  Gleason  6  versus  9+  analysis  in  TCGA  prostate  RNA-seq  data.  Relative 
expression  of  these  genes  in  benign  and  metastatic  prostate  cancer  tissues  [1 1  ]  are  also  displayed  alongside  for  comparison.  Expression 
is  depicted  as  log2  of  the  fold-change  over  the  median  of  the  Gleason  6  samples  for  each  gene.  Patients  grouped  by  cancer  progression/Gleason 
score.  Rows  represent  genes  and  columns  represent  samples.  C.  Scatterplot  showing  the  expression  level,  prostate  tissue  specificity,  and 
prostate  cancer  association  of  protein  coding  (solid  circle)  and  IncRNA  (solid  triangles)  genes  identified  in  1  A.  Expression  is  represented  by  the 
FPKM  value  for  the  95th  percentile  prostate  cancer  sample.  Cancer  versus  normal  and  prostate  tissue  specificity  are  represented  by  the 
percentile  score  for  each  gene  in  an  SSEA  analysis.  D.  The  top  five,  Gleason  6  associated  genes  listed  in  the  order  of  prostate  tissue 
specificity.  E.  Expression  of  PCAT1 4  across  all  cancer  and  normal  tissue  type  represented  in  the  TCGA.  Inset  shows  genome  browser  view  of 
PCAT14  genomic  location. 


Neoplasia  Vol.  18,  No.  8,  2016 


PCAT14  prognosticates  prostate  cancer  Shukla  et  al.  493 


prostate  cancers  versus  normal  (SSEA,  FDR  <0.0001)  leaving 
a  total  of  99  candidates  genes  (Figure  1A,  Supplementary  Table  1). 
Interestingly,  clustering  analysis  revealed  signature  expression 
patterns,  specifically  associated  with  low,  high  Gleason  and  metastatic 
status  and  included  both  novel  and  previously  characterized  genes 
(Figure  15).  CENPF  and  EZH2,  protein  coding  genes  with  a  known 
association  with  high  grade  prostate  cancer  were  rediscovered  through 
this  analysis  [22,23].  Similarly,  we  rediscovered  SChLAPl  a  long 
non-coding  RNA  (IncRNA)  associated  with  aggressive  prostate 
cancers  [13,17]  in  our  analysis  (Figure  15).  With  the  goal  of 
identifying  potential  biomarkers  that  distinguish  indolent  prostate 
cancers,  we  focused  on  genes  enriched  in  low  grade  disease  that  are 
expressed  highly  in  prostate  tissue  and  that  also  show  prostate  cancer 
and  tissue  specificity  (Figure  1C).  Interestingly,  a  IncRNA,  PCAT14 
appeared  to  be  one  of  the  top  low-Gleason-associated  genes  with 
robust  prostate  tissue  expression,  substantial  prostate  tissue  specific¬ 
ity,  and  significant  overexpression  in  prostate  cancers  versus  normal 
(Figure  ID).  In  fact,  among  all  genes  (coding  and  non-coding), 
PCAT14  ranked  among  the  top  5  in  terms  of  expression  level, 
Gleason  6  versus  9+  association,  and  cancer  versus  normal  association 
(Figure  ID).  Additionally,  among  the  top  5  candidate  genes, 
PCAT14  was  the  only  gene  to  exhibit  striking  prostate  tissue 
specificity,  a  particularly  relevant  metric  for  a  potential  biomarker 
(Figure  15).  The  remaining  4  genes  exhibited  variable  prostate  tissue 
specificity  (Supplementary  Figure  1).  PCAT14  is  a  poly-exonic  gene 
found  within  a  gene  desert  on  chromosome  22,  with  a  striking 
prostate  cancer  and  lineage  specific  expression  pattern  across  the 
>10,000  TCGA  cancer  and  normal  tissue  samples  (Figure  15).  For 
these  reasons,  we  elected  to  pursue  PCAT14  as  a  promising  bio  marker 
that  can  identify  low  grade  prostate  cancer. 

Genomic  Organization  and  Regulation  of  P CAT  14 

We  collected  multiple  lines  of  evidence  from  both  experimental 
data  and  available  annotations  to  consolidate  the  genomic  organiza¬ 
tion  of  PCAT14.  Based  on  assembled  reads  from  RNA-seq  data 
assembled  in  the  MiTranscriptome  [7],  we  predicted  the  structure  of 
the  PCAT14  transcript  variants  (Supplementary  Figure  1A). 
Additionally,  as  an  independent  approach  to  define  the  exon  structure 
of  PCATl4y  we  performed  rapid  amplification  of  cDNA  ends 
(RACE)  in  two  prostate  cancer  cell  lines  VCaP  and  MDA-PCa-2b 
that  express  PCAT14  at  high  levels  (Supplementary  Figure  IB  and 
C).  Our  analyses  show  that  the  PCAT14  gene  is  located  on 
chr22-qll.2  and  contains  4  exons.  Among  the  four  transcript 
isoforms,  the  2.3  kb  variant- 1  demonstrates  the  highest  expression 
(Supplementary  Figure  ID).  Next,  using  published  ChIP-seq  data  in 
VCaP  cells  [24],  we  show  that  PCAT14  has  all  the  histone  marks 
(H3K4me3,  H3K36me3,  H3K27ac)  associated  with  actively  tran¬ 
scribed  genes  (Figure  2A).  We  further  performed  subcellular 
fractionation  followed  by  qPCR  to  show  that  PCAT14  is  distributed 
equally  between  nuclear  and  cytoplasmic  compartments  (Figure  25). 

Androgen  receptor  plays  a  major  role  To  identify  any  potential 
regulation  of  PCAT14  gene  by  androgen,  we  assessed  the  presence  of 
AR  peaks  in  PCAT14  genomic  region  using  AR-ChIP-seq  data 
generated  in  VCaP  cells  [24]  and  saw  significant  AR  peaks  in  PCAT14 
loci.  Some  of  these  peaks  were  also  enhanced  upon  treatment  with 
DHT  and  were  suppressed  upon  treatment  with  AR  antagonist 
MDV3100  or  bicalutamide  (Figure  2Q.  To  corroborate  this  finding, 
we  assessed  the  expression  of  PCAT14  mRNA  in  VCaP  cells  upon  AR 
stimulation.  Similar  to  the  canonical  AR  targets  such  as  KLK3  and 


55555,  PCAT14 expression  was  also  significantly  elevated  (four  fold  in 
24  hours)  upon  DHT  stimulation  (Figure  2D)  and  suppressed  by 
MDV3100  treatment  (Figure  25).  In  another  line  of  investigation,  we 
queried  if  epigenetic  regulation  might  play  a  role  in  the  prostate  cancer 
and  lineage  specific  expression  of  PCAT14  observed  in  tissue  samples 
(Figure  15).  Using  a  prostate  cancer  cell  line  (FNCaP)  model  we  show 
significant  elevation  of  PCAT14  expression  when  treated  with 
5-azacytidine  (5-Aza),  a  DNA  demethylation  agent,  suggesting  a 
potential  role  for  promoter  methylation  in  regulation  of  PCAT14 
(Figure  25).  However,  our  attempt  to  capture  this  event  in  TCGA 
tissue  samples  where  Infinium  450  K  DNA  methylation  array  data  is 
available  was  inconclusive,  due  to  the  lack  of  probes  in  PCAT14 
promoter  region.  Taken  together  we  show  PCAT14  is  an  AR  target  gene 
that  may  also  be  subjected  to  epigenetic  regulation  in  prostate  cancer. 

Clinical  Association  of  PCA  T14 

Having  observed  an  inverse  correlation  of  PCAT14  with  Gleason 
Score  (GS)  in  our  RNA-seq  cohort,  we  next  assessed  the  association  of 
PCAT14  expression  with  clinical  outcomes  in  prostate  cancer.  For  this 
analysis  we  first  divided  samples  into  7  groups  (benign,  GS-6,  GS-7 
(3  +  4),  GS7  (4  +  3),  GS-8,  GS-9  and  Mets)  and  examined  the 
expression  of  PCAT14 using  two  different  datasets  (TCGA  and  Taylor 
et  al.).  We  identified  a  significant  decrease  in  PCAT14  expression  as 
Gleason  grade  increased  in  both  cohorts  (Figure  3 A  and  5). 
Importantly,  in  the  large  TCGA  dataset,  expression  was  significantly 
different  between  GS6  and  all  other  groups  except  GS7  (3  +  4).  We 
next  assessed  the  diagnostic  ability  of  PCAT14  to  identify  prostate 
cancers  versus  normal.  In  both  the  TCGA  and  Taylor  prostate  cancer 
cohorts,  PCAT14 expression  was  able  to  significantly  distinguish  cancer 
from  normal  with  an  AUC  of 0.837  and  0.823  respectively  (Figure  3 Q 
supporting  its  utility  as  a  diagnostic  biomarker. 

Using  an  alternate  approach  to  further  characterize  the  clinical 
associations  of  PCAT14 ,  we  performed  a  “guilt-by-association” 
analysis,  assessing  the  clinical  significance  of  the  protein-coding 
genes  most  correlated  with  PCAT14  (Supplementary  Table  2)  in  the 
TCGA  prostate  cancer  cohort,  leveraging  cancer  microarray  data  from 
the  Oncomine  resource  [21].  As  expected,  genes  positively  correlated 
with  PCAT14  were  up  regulated  in  cancer  vs  normal  analysis  and  were 
downregulated  in  clinically  advanced  prostate  cancer  (Figure  3D). 
Interestingly,  we  found  a  striking  association  of  PCAT14  correlated 
genes  with  concepts  related  to  better  prognosis  (Figure  3D),  and  these 
genes  were  under-expressed  in  recurrent  and  hormone  refractory 
prostate  cancer  suggesting  that  PCAT14  may  be  a  marker  of  better  clinical 
outcomes  in  prostate  cancer.  In  contrast,  genes  that  positively  correlated 
with  SChLAPl ,  a  IncRNA  known  to  be  associated  with  clinically 
aggressive  prostate  cancer,  were  found  to  be  overexpressed  in  advanced 
prostate  cancer  as  well  as  in  cancer  with  poor  outcomes  [15,17]. 

To  further  investigate  the  association  of PCAT14 with  favorable 
clinical  outcomes  in  prostate  cancer,  we  performed  Cox  regression 
analysis  on  a  cohort  of  355  patients  (John  Hopkins  University  (JHU) 
cohort)  who  did  not  receive  treatment  prior  to  metastasis  (median 
follow-up  9  years).  Univariate  analysis  showed  that,  patients  with 
high  PCAT14  expression  were  significantly  associated  with  better 
BPFS  (5  =  .000062;  HR  =  0.59  [0.45-0.76]),  MFS  (5  =  .00016; 
HR  =  0.46  [0.32-0.66]),  PSS  (5  =  .0067;  HR  =  0.47[0.27-0.82]) 
and  OS  (5  =  .022;  HR  =  0.57  [0.35-0.93])  (Figure  4 A-D).  In  a  Cox 
multivariate  analysis  including  clinicopathologic  variables,  PCAT14 
stands  out  as  a  significant  independent  predictor  of  PSS  (5  =  .0385; 
HR  =  0.55  [0.31-0.97]),  MFS  (5  =  .000609;  HR  =  0.52 [0.36-0.76]) 
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Figure  2.  Subcellular  localization  and  regulation  of  PCAT14.  A.  Genome  browser  view  of  PCAT14  locus.  ChIP-seq  tracks  for  H3K4me3, 
H3K27ac,  H3K36me3  and  Pol-1 1  generated  in  prostate  cancer  VCaP  cells  are  shown.  Prostate  RNA-seq  reads,  transcript  schematic  based 
on  RACE  results  and  Refseq,  GENCODE,  MiTranscriptome  assembly  annotations  are  also  provided.  Solid  blocks  indicate  exons  while  thin 
lines  intron  and  arrows  indicate  the  genomic  orientation.  B.  Bar  plots  represent  the  subcellular  localization  of  PCAT14  in  prostate  cell  lines. 
PCAT14  transcript  was  equally  found  in  both  cytoplasmic  (red)  and  nuclear  (blue)  compartments  in  both  MDA-PCa-2b  and  VCaP  cell  lines. 
GAPDH  and  U1  RNA  were  used  as  controls.  C.  Genome  browser  view  of  the  PCAT14  genomic  locus  for  AR  ChIP-seq  data  tracks  obtained 
from  VCaP  cells  treated  with  either  vehicle  (black)  or  dihydrotestosterone  (DHT)  alone  (Red)  or  combinations  (dark  blue)  including 
DHT  +  MDV3100  and  DHT  +  Bicalutamide.  Significant  AR  binding  observed  in  each  data  track  are  represented  as  peaks.  D-E. 
Histograms  represent  the  expression  of  PCAT14,  TMPRSS2  and  KLK3  in  VCaP  cells  after  treatment  with  1 0  nM  DHT  or  with  MDV31 00  for 
indicated  time  points.  F.  Bar  plots  represent  re-expression  of  PCAT1 4  and  GSTP1  in  LNCaP  cells  after  treatment  with  5-Aza  deoxycytidine 
(5-Aza)  for  5  days  at  indicated  concentrations. 


and  BRFS  ( P  =  .00126,  HR  =  0.64  [0.49-0.84]),  with  borderline 
significance  for  OS  (Table  1,  Supplementary  Table  3).  In  addition,  we 
also  analyzed  the  association  o£PCATl4 expression  with  clinical  outcome 
in  two  independent  data  sets  of  140  (Taylor  et  al)  and  377  (TCGA) 
patients  using  the  statistical  approaches  mentioned  above  [23].  Similar 
to  JHU  cohort,  high  PCAT14  expression  predicted  for  better  BRFS 
(Figure  4E)  and  MFS  (Figure  4F).  We  also  show  that  high  PCAT14 
expression  was  predictor  of  better  prognosis  in  lower  Gleason  grade 
samples  (Supplementary  Figure  3B). 


P CAT  14  Expression  In-Situ 

LncRNA  detection  in  cancer  tissue  sections  by  RNA  in-situ 
hybridization  (RNA-ISH)  technology  has  similar  clinical  utility  as 
immunohistochemical  evaluation  of  protein  biomarkers  [16,26]. 
Hence  we  evaluated  PCAT14  transcript  levels  in  PCa  FFPE  tissues 
using  specific  probes  to  perform  a  RNA-ISH.  We  first  probed  a  panel 
of  FFPE  sections  derived  from  either  murine  prostate,  kidney,  lung 
(negative  controls)  or  xenografts  from  MDA-PCa-2b  cells,  a  cell  line 
that  expresses  PCAT14  at  high  levels  (positive  control).  As  expected, 
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Figure  3.  PCAT14  is  marker  of  low  grade  tumros.  A-B.  Expression  of  PCAT14  in  samples  distinguished  by  Gleason  grade  in  TCGA  (A), 
Taylor  (B)  cohorts.  (*  =  P  <  .05,  **  =  P<  .01,  ****  =  P<  .0001;  compared  to  Gleason  6).  C.  ROC  analysis  of  PCAT14  expression  in 
the  TCGA  and  Taylor  cohorts.  D.  Network  representation  of  genes  positively  correlated  with  PCAT14  in  localized  prostate  cancers  using 
Oncomine  concepts  analysis  and  visualized  with  the  Force-Directed  Layout  algorithm  in  the  Cytoscape  tool  [29].  Node  names  are 
assigned  according  to  the  author  of  the  primary  study  [25,30-38].  Nodes  are  colored  according  to  the  concept  categories  indicated  in  the 
figure  legend.  Thickness  of  the  edges  implies  higher  odds  ratio. 


high  levels  of  specific  signal  was  present  in  MDA-PCa-2b  xenografts 
while  no  expression/staining  was  seen  in  the  negative  control  murine 
tissues  (Supplementary  Figure  5 A  and  B).  Consistent  with  the  cell 
fractionation  data,  expression  of  PCAT14  was  seen  in  both  nuclear 
and  cytoplasmic  compartments.  Next  we  obtained  frozen  and 
matched  formalin  fixed  paraffin  embedded  (FFPE)  tissues  sections 
derived  from  a  patient  radical  prostatectomy  specimen  with  Gleason 
score  3  +  3  =  6  disease.  q-PCR  analysis  on  cDNA  from  frozen  tissues 
derived  from  this  specimen  shows  a  7—8  fold  increase  in  PCAT14 
expression  in  cancer  compared  to  the  adjacent  benign  tissue  (Figure  3 A). 
RNA-ISH  also  demonstrated  that  PCAT14 is  differentially  expressed  in 
PCa  as  we  saw  striking  difference  of  transcript  expression  with 
high  signals  located  in  the  prostatic  adenocarcinoma  glands  and  with 
no/minimum  staining  in  the  benign  section  (Figure  3 B).  To  further 
expand  these  results,  we  performed  RNA-ISH  on  a  PCa  tissue 
microarray  (TMA,  n  =  129)  (Figure  5 Q  and  found  that  PCAT14 
expression  was  able  to  distinguish  tumor  from  normal  (AUC  0.863) 
(Figure  5 D)  and  was  high  in  Gleason-6  with  minimal  expression  noted 
in  benign  tissue  or  Gleason  8  disease  (Figure  5 E). 


Functional  Evaluation  of  P CAT  14 

Since  expression  of  PCAT14  was  lower  in  high  grade  prostate 
cancer  and  its  expression  predicted  better  outcomes,  we  hypothesize 
that  PCAT14  may  have  tumor  suppressive  effects.  To  test  this 
hypothesis,  we  performed  overexpression  studies  in  PC3  and  LNCaP 
cells,  prostate  cancer  cell  lines  that  do  not  express  PCAT14 
(Supplementary  Figure  2B,  C).  To  overexpress  PCAT14 ,  we  used  a 
CRISPR  (clustered  regularly  interspaced  short  palindromic  repeat) - 
Cas9  Synergistic  Activation  Mediator  (SAM)  complex  [18].  This 
method  allows  endogenous  overexpression  of  a  gene  by  recruiting 
artificial  transcriptional  factors  to  the  promoter  using  single-guide 
RNA  (sgRNA-MS2)  (See  method  section  for  details).  We  designed  6 
sgRNAs  targeting  the  PCAT14  promoter  and  tested  their  ability  to 
induce  PCAT14  expression  using  HEK293  cells  stably  expressing 
transcription  factors.  We  found  three  sgRNAs  that  significantly 
increased  PCAT14  expression  in  HEK293  cells  (Supplementary  Figure 
5A).  We  next  used  these  sgRNA  to  construct  PC3  and  LNCaP  cells  stable 
expressing  PCAT14  (Figure  6A).  Using  two  independent  sgRNAs  we 
were  able  to  achieve  500  to  1000-fold  endogenous  overexpression  of 
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Figure  4.  PCAT14  is  a  prognostic  biomarker.  A-D.  Kaplan-Meier  analyses  of  prostate  cancer  outcomes  in  the  John  Hopkins  cohort. 
PCAT14  expression  was  measured  using  Affymetrix  exon  arrays,  and  subjects  were  stratified  according  to  their  PCAT14  expression  level. 
Subject  outcomes  were  analyzed  for  biochemical  progression  (D)  and  Metastasis  free  survival  (E),  Prostate  cancer-specific  survival  (F)  and 
overall  survival  (G).  Subject  outcomes  were  analyzed  for  Kaplan-Meier  curves,  P  values  determined  using  a  log-rank  test.  E-F.  Kaplan- 
Meier  analyses  of  biochemical  progression  free  survival  in  the  Taylor  (E)  and  Metastasis  Free  survival  in  the  TCGA  (F)  cohorts  of  prostate 
cancer.  Patients  were  divided  into  two  groups  based  on  the  expression  level  of  PCAT14.  P  values  for  Kaplan-Meier  curves  were 
determined  using  a  log-rank  test. 


PCAT14  in  PC3  cells  (Figure  GB)  and  20—100  fold  overexpression  in 
LNCaP  cells  (Supplementary  Figure  3 B).  While  we  observed  no 
significant  effect  of  PCAT14  overexpression  on  proliferation  of  PC3  or 
LNCaP  cells  (Figure  6C  and  Supplementary  Figure  5 Q,  overexpression 


of  PCAT14  lead  to  suppression  of  invasion  capacity  of  both  PC3  and 
LNCaP  cells  (Figure  6C,  Z);  Supplementary  Figure  5E,  F),  in  line  with  its 
prior  identified  association  with  clinically  indolent  disease.  We  then 
looked  at  the  effects  of  PCAT14  knockdown  on  cell  expressing 


Table  1.  Multivariate  Analysis  in  JHU  Cohort 


Biochemical  Recurrence  Free  Survival 

Metastasis  Free  Survival 

Prostate  Cancer  Free  Survival 

Overall  Survival 

A- Value 

HR  [95%  Cl] 

A- Value 

HR  [95%  Cl] 

A- Value 

HR  [95%  Cl] 

A- Value 

HR  [95%  Cl] 

PCAT14  High  vs.  Low 

.00126 

0.64  [0.49-0.84] 

.000609 

0.52  [0.36-0.76] 

.0385 

0.55  [0.31-0.97] 

.0567 

0.62  [0.38-1.01] 

Age 

.818 

1  [0.98-1.02] 

.65 

0.99  [0.96-1.02] 

.338 

0.98  [0.93-1.02] 

.151 

0.97  [0.93-1.01] 

PSA  Int  vs.  Low 

.241 

0.83  [0.62-1.13] 

.353 

0.83  [0.55-1.24] 

.385 

0.75  [0.4-1.42] 

.366 

0.77  [0.44-1.35] 

PSA  High  vs.  Low 

.916 

0.98  [0.63-1.52] 

.574 

0.84  [0.47-1.52] 

.463 

0.73  [0.31-1.7] 

.582 

0.81  [0.39-1.7] 

Gleason  High  vs.  Low 

2.98E-05 

1.83  [1.38-2.43] 

1.00E-08 

3.08  [2.1-4.52] 

.000224 

3.1  [1.7-5.65] 

.000988 

2.38  [1.42-3.99] 

Seminal  vesicle  invasion 

.0042 

1.52  [1.14-2.03] 

.453 

1.16  [0.79-1.69] 

.774 

0.92  [0.51-1.66] 

.82 

0.94  [0.56-1.59] 

Surgical  margin  status 

.000533 

1.78  [1.28-2.47] 

.000276 

2.15  [1.42-3.25] 

.0487 

1.93  [1-3.7] 

.0825 

1.67  [0.94-2.99] 

Extracapsular  extension 

.456 

1.14  [0.81-1.58] 

.459 

1.21  [0.73-2.03] 

.636 

0.83  [0.39-1.77] 

.816 

0.93  [0.48-1.77] 

Lymph  node  invasion 

8.98E-12 

3.23  [2.31-4.52] 

.000164 

2.21  [1.46-3.35] 

.0616 

1.86  [0.97-3.57] 

.254 

1.42  [0.78-2.6] 

HR:  Hazard  Ratio 
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Figure  5.  PCAT14  RNA-ISH  in  prostate  cancer  tissues.  A.  Barplot  to  show  the  expression  of  PCAT14  in  tumor  tissue  and  adjacent  benign 
by  qRT-PCR.  B.  A  representative  PCAT14  RNA  in-situ  hybridization  image.  White  arrows  indicate  Gleason  score  6  disease  and  black 
arrows  indicate  benign  glands.  C.  Representative  PCAT14  In  situ  hybridization  images  of  human  prostate  cancer  samples  of  different 
Gleason  grades.  D.  ROC  analysis  of  PCAT14  expression  in  the  prostate  TMAs.  E.  Representation  of  mean  PCAT14  ISH  product  score  for 
benign  prostatic  glands  (benign),  Gleason  score  6,  Gleason  score  3  +  4  =  7,  Gleason  score  4  +  3  =  7  and  Gleason  score  8+  clinically 
localized  prostate  cancer  in  a  TMA  cohort.  (**  =  P  <  .01 ;  compared  to  benign). 


PCAT14  at  high  levels  (VCaP  and  MDA-PCa-2B).  In  both 
MDA-PCa-2b  and  VCaP  cells  using  2  independent  siRNA  as  well  as 
8  independent  ASOs  we  were  able  to  achieve  more  than  80% 
knockdown  efficiency  (Supplementary  Figure  5G-J).  However,  we  did 
not  observe  a  consistent  effect  on  cell  proliferation  as  well  as  cell  invasion 
(Supplementary  Figure  5K-N  and  data  not  shown). 

Discussion 

In  this  study,  we  perform  a  large-scale  RNA-sequencing-based 
analysis  of  biomarkers  associated  with  indolent  versus  aggressive 
prostate  cancer  and  identify  the  long  noncoding  RNA  PCAT14  as  a 
marker  of  low  grade  and  indolent  disease.  We  define  the  exon 
structure  of  PCAT14  and  demonstrate  that  PCAT14  is  an 
AR-regulated  IncRNA.  Using  two  independent  data  sets,  we  show 
that  PCAT14  is  highly  upregulated  in  prostate  cancer  compared  to 
benign  tissue  and  is  able  to  distinguish  prostate  cancer  from  normal 
tissue  with  high  sensitivity  and  specificity,  suggesting  that  PCAT14 
can  be  an  excellent  diagnostic  biomarker.  Moreover,  we  demonstrate 
that  expression  of  PCAT14  is  prognostic  of  outcome  and  is  associated 


with  better  biochemical  progression-free  survival,  metastases-free 
survival,  and  prostate  cancer-specific  survival.  Importantly,  we  find 
that  PCAT14  expression  is  a  prognostic  biomarker  which  adds  to 
standard  clinicopathologic  variables. 

As  such,  PCAT14  represents  a  unique  biomarker.  Most  diagnostic 
biomarkers,  such  as  PCA3 ,  can  distinguish  cancer  from  normal  tissue, 
but  are  not  prognostic  [4] .  Conversely,  many  prognostic  biomarkers, 
such  as  Ki-67,  hold  little  diagnostic  value.  It  is  unclear  why  PCAT14 
increases  significantly  in  expression  during  the  initial  formation  of 
cancer,  but  then  subsequently  decreases  in  expression  in  disease 
aggressiveness;  this  observation  requires  follow  up  with  further 
mechanistic  studies  but  is  also  a  feature  that  gives  PCAT14  value  as  a 
biomarker  across  multiple  clinical  contexts.  Of  note,  PCAT14  was 
also  found  to  be  expressed  in  testicular  cancer  samples  along  with 
prostate  cancer,  suggesting  the  role  of  PCAT14  in  the  testicular 
cancer  pathogenesis.  However,  due  to  lack  of  normal  testis  samples  in 
the  TCGA  database,  it  is  unclear,  at  this  point,  whether  PCAT14  is 
differentially  regulated  in  testicular  cancer  compared  to  normal  testis. 
Recently,  the  Genotype-Tissue  Expression  (GTEx)  program  has 
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Figure  6.  Functional  analysis  is  PCAT14.  A.  Schematic  representation  of  the  workflow  to  endogenously  overexpress  PCAT14  in  prostate 
cancer  cells  using  CRISPR/SAM  system.  B.  Bar  plots  represent  fold  increase  in  PCAT14  level  in  PC3  cells  expressing  dCas9-VP64  and 
MS2-p65-HSF1  with  control  or  2  independent  PCAT14  sgRNAs.  C.  Bar  plot  represent  quantification  of  invaded  PC3  cells  with  or  without 
PCAT14  expression.  D.  Representative  images  of  invaded  PC3  cells  with  or  without  PCAT14  expression. 


generated  a  large  amount  of  high  throughput  sequencing  data  on 
normal  tissue  including  testis  [27].  This  data  would  be  useful  to  look 
at  the  role  of  PCAT14  in  testicular  carcinoma. 

In  an  attempt  to  develop  a  clinical  grade  assay  to  detect  expression 
of  PCAT14 ,  we  developed  a  novel  assay,  using  ISH  probes,  which  can 
be  applied  to  formalin  fixed  paraffin-embedded  tissues.  This  ISH 
assay  provides  an  opportunity  to  validate  our  findings  in  larger 
cohorts  with  associated  clinical  data  in  the  future.  Ultimately,  an 
optimized  approach  for  predicting  indolent  versus  aggressive  disease 
will  include  both  clinicopathologic  parameters  integrated  with 
molecular  biomarkers.  It  is  likely  that  this  molecular  assay  will 
involve  multiplexing  multiple  biomarkers,  and  may  require  combining 
both  tissue-based  and  urine-based  biomarkers.  Potential  intriguing 
subsequent  studies  include  the  assessment  of  PCAT14  and  other 
candidate  IncRNAs,  in  addition  to  PCA3 ,  as  urine  biomarkers. 

There  are  a  several  limitations  to  our  study.  While  we  demonstrate 
the  potential  value  of  PCAT14  expression  as  a  biomarker,  it  is  unclear 
how  PCAT14  is  modulating  oncogenic  phenotypes,  from  a 
mechanistic  perspective.  Additionally,  while  we  demonstrate  the 
relative  specificity  of  PCAT14  for  both  prostate  and  testicular  cancers, 
the  molecular  basis  underlying  this  specificity  remains  to  be 
elucidated.  It  is  known  that  AR  can  regulate  expression  of  genes  in 
both  prostatic  and  testicular  tissues,  but  we  do  not  know  whether  the 
relative  cancer-specificity  can  be  attributed  to  AR.  Clearly,  these  are 
important  areas  for  future  study. 

Overall,  our  study  highlights  the  need  to  look  at  both  conventional 
protein-coding  genes  and  noncoding  genes  in  the  search  for  optimal 
biomarkers.  To  our  knowledge,  there  are  approximately  20,000 
protein  coding  genes  [28],  which  comprise  2%  of  the  genome.  Given 
our  recent  study  demonstrating  that  there  are  close  to  60,000  long 
noncoding  RNAs  (IncRNAs)  [7],  many  of  which  are  specific  to 


certain  cancers,  it  is  clear  that  these  IncRNAs  present  a  relatively 
underexplored  frontier  for  biomarker  development,  and  that 
PCAT14  may  represent  an  initial  candidate  to  be  further  explored 
along  this  frontier. 

Conclusion 

By  performing  differential  expression  analysis  between  prostate  cancer 
with  low  vs  high  Gleason  scores,  we  identified  IncRNA  PCAT14  as  a 
prostate  cancer-  and  lineage-  specific  biomarker  of  indolent  disease. 
We  show  that  PCAT14  is  an  AR-regulated  transcript  and  its 
overexpression  suppresses  invasion  of  prostate  cancer  cells.  Moreover, 
in  multiple  independent  datasets,  PCAT14  expression  associates  with 
favorable  outcomes  in  prostate  cancer  and  adds  prognostic  value  to 
standard  clinicopathologic  variables. 

Supplementary  data  to  this  article  can  be  found  online  at  http://dx. 
doi.org/10.1016/j.neo. 20 16.07.001. 
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The  IncRNA  landscape  of  breast  cancer  reveals 
a  role  for  DSCAM-AS1  in  breast  cancer  progression 
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Molecular  classification  of  cancers  into  subtypes  has  resulted  in  an  advance  in  our 
understanding  of  tumour  biology  and  treatment  response  across  multiple  tumour  types. 
However,  to  date,  cancer  profiling  has  largely  focused  on  protein-coding  genes,  which 
comprise  <1%  of  the  genome.  Here  we  leverage  a  compendium  of  58,648  long  noncoding 
RNAs  (IncRNAs)  to  subtype  947  breast  cancer  samples.  We  show  that  IncRNA-based 
profiling  categorizes  breast  tumours  by  their  known  molecular  subtypes  in  breast  cancer. 
We  identify  a  cohort  of  breast  cancer-associated  and  oestrogen-regulated  IncRNAs, 
and  investigate  the  role  of  the  top  prioritized  oestrogen  receptor  (ER)-regulated  IncRNA, 
DSCAM-AS1.  We  demonstrate  that  DSCAM-AS1  mediates  tumour  progression  and  tamoxifen 
resistance  and  identify  hnRNPL  as  an  interacting  protein  involved  in  the  mechanism  of 
DSCAM-AS1  action.  By  highlighting  the  role  of  DSCAM-AS1  in  breast  cancer  biology  and 
treatment  resistance,  this  study  provides  insight  into  the  potential  clinical  implications  of 
IncRNAs  in  breast  cancer. 
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Long  noncoding  RNAs  (LncRNAs)  have  recently  been 
implicated  in  a  variety  of  biological  processes,  including 
carcinogenesis  and  tumour  growth1-6.  Operating  through  a 
myriad  of  mechanisms2,  IncRNAs  have  challenged  the  central 
dogma  of  molecular  biology  as  prominent  functional  RNA 
molecules.  To  investigate  the  role  of  IncRNAs  in  breast  cancer, 
we  interrogated  the  expression  of  IncRNAs  across  an 
RNA-sequencing  (RNA-seq)  breast  tissue  cohort  comprised  of 
947  breast  samples7,8.  Previously,  in  a  large-scale  ab  initio  meta¬ 
assembly  study  from  6,503  RNA-seq  libraries,  we  discovered 
~  45,000  of  unannotated  human  IncRNAs7,  and  this  assembly 
was  utilized  for  the  present  study.  Building  on  prior  work  that  has 
begun  to  investigate  the  role  of  IncRNAs  in  breast  cancer9,  we  set 
out  to  perform  a  comprehensive  analysis  of  breast  cancer  tissue 
RNA-seq  data  to  identify  the  IncRNAs  potentially  involved  in 
breast  cancer. 

Patients  with  oestrogen  receptor  (ER) -positive  breast  cancer 
have  better  prognosis  than  those  with  ER-negative  disease, 
based  on  both  a  more  indolent  natural  history  but  perhaps 
more  importantly  due  to  effective  anti-oestrogen,  also  designated 
‘endocrine,’  therapy10.  Despite  the  efficacy  of  endocrine  therapy, 
however,  the  majority  of  breast  cancer  deaths  occur  in  women 
with  ER-positive  breast  cancers,  because  the  incidence  of 
ER-positive  versus -negative  disease  is  much  higher 
(approximately  80  versus  20%),  and  because  a  substantial 
fraction  of  women  either  have  inherent  or  acquired  endocrine 
therapy-  resistant  disease 1 1 . 

Taken  together,  these  considerations  highlight  the  pressing 
need  to  understand  the  biology  of  the  ER-driven  breast  cancers 
and  their  mechanism  of  resistance  to  endocrine  therapy.  The 
mechanism  through  which  ER  mediates  cancer  initiation  and 
progression  is  an  area  of  intense  scientific  investigation12-14  that 
remains  incompletely  understood.  In  this  regard,  while 
substantial  research  has  been  focused  on  ER  abnormalities, 
such  as  mutations  in  the  gene  encoding  for  ER  (ESR1) 14,15  and  on 
the  co-existing  activation  pathways  that  might  mediate  resistance, 
such  as  HER216,  few  studies  exist  that  interrogate  ER- regulated 
noncoding  RNAs17-21.  Therefore,  we  set  out  to  perform  a 
comprehensive  discovery  and  investigation  of  those  IncRNAs  that 
are  driven  by  oestrogen  in  breast  cancers  drawing  from  a  large 
human  tissue  RNA-seq  cohort. 

Results 

Identification  of  ER-  and  breast  cancer-associated  IncRNAs. 

We  initially  focused  on  those  IncRNAs  most  differentially 
expressed  in  breast  cancers  in  comparison  to  benign  adjacent 
tissue  (Supplementary  Data  1),  utilizing  a  non-parametric  dif¬ 
ferential  expression  tool  for  RNA-seq  called  Sample  Set  Enrich¬ 
ment  Analysis  (SSEA)7.  After  applying  an  expression  filter  (at 
least  one  fragments  per  kilobase  of  transcript  per  million  mapped 
reads  (FPKM)  expression  in  the  breast  samples  in  the  top  5% 
based  on  gene  expression  level),  we  identified  437  of  the  most 
differentially  expressed  IncRNAs  in  breast  cancer  (Supplementary 
Data  2).  Interestingly,  unsupervised  hierarchical  clustering  of  the 
samples  based  on  expression  of  these  IncRNAs  across  all  breast 
cancer  samples  (Methods  section)  largely  separated  out  the  breast 
cancer  samples  by  PAM50  subtypes22,23,  suggesting  that  IncRNAs 
may  be  contributing  to  the  distinct  biology  of  these  subtypes 
(Fig.  la).  While  IncRNA  expression  was  unable  to  distinguish 
between  the  ER-driven  luminal  A  and  luminal  B  subtypes,  the 
luminal  subtypes  were  well  separated  from  the  HER2,  basal  and 
normal  subtypes  (Fig.  la).  In  addition  to  separating  out  the 
clinical  subtypes  of  breast  cancer,  the  IncRNAs  themselves 
separated  into  three  distinct  clusters.  The  first  cluster  (Fig.  la, 
‘Luminal’)  contains  IncRNAs  overexpressed  mostly  in  luminal  A 


and  luminal  B  samples,  with  little  expression  in  samples  of  the 
other  subtypes,  and  little  expression  in  normal  samples.  The  next 
cluster  contains  IncRNAs  upregulated  across  all  breast  cancer 
samples  (Fig.  la,  ‘Upregulated’),  and  this  cluster  included  the 
known  breast  cancer  IncRNA,  HOTAIR.  The  third  cluster 
(Fig.  la,  ‘Downregulated’)  contains  IncRNAs  downregulated  in 
breast  cancers.  The  IncRNAs  in  the  luminal  cluster  present  a 
particularly  intriguing  class  of  potentially  oestrogen-responsive 
IncRNAs. 

Using  the  947  breast  tumour  RNA-seq  samples 
(Supplementary  Data  1),  we  identified  IncRNAs  differentially 
expressed  in  ER-positive  versus  ER-negative  breast  tumours 
(Fig.  lb,  Supplementary  Data  2).  As  expected,  the  expression  of 
IncRNAs  differentially  expressed  in  ER-positive  tumours 
separated  the  luminal  tumours  from  the  basal  and  HER2  on 
unsupervised  hierarchical  clustering  (Fig.  lb).  Quite  interestingly, 
a  number  of  IncRNAs  that  were  downregulated  in  ER-positive 
samples  exhibited  increased  expression  in  the  basal  samples 
(Fig.  lb,  ‘Basal  IncRNAs’).  While  these  basal  IncRNAs  were 
identified  in  an  ER-positive  versus  ER-negative  cancer  analysis,  a 
number  of  them  also  exhibit  low  expression  in  normal  breast 
tissue  (Supplementary  Fig.  1).  Given  that  a  paucity  of  known 
driver  genes  exist  for  basal  breast  cancers  and  that  these  tumours 
are  the  most  clinically  aggressive,  these  basal-specific  IncRNAs 
may  represent  an  exciting  future  area  for  basal  breast  cancer 
biology. 

We  set  out  to  investigate  potentially  oncogenic  ER- regulated 
IncRNAs  by  intersecting  the  IncRNAs  upregulated  in  both 
the  cancer  versus  normal  (Fig.  la)  and  ER-positive  versus 
ER-negative  (Fig.  lb)  analyses.  Sixty- three  IncRNAs  were 
upregulated  in  both  the  cancer  versus  normal  analysis  and  the 
ER-positive  versus  ER-negative  analysis  (Supplementary  Data  2, 
Fig.  lc).  To  prioritize  the  most  biologically  and  clinically  relevant 
IncRNAs,  we  focused  on  IncRNAs  most  highly  expressed  in 
breast  cancer  tissues,  and  those  most  directly  regulated  by  ER, 
based  on  ER  binding  to  the  targets’  promoter  as  well  as  the  degree 
of  induction  of  expression  following  oestrogen  stimulation  in 
breast  cancer  cells  (Fig.  lc  and  Supplementary  Fig.  2a).  This 
approach  nominated  DSCAM-AS1  as  a  IncRNA  expressed  at  a 
very  high  level  in  breast  cancer  tissues,  containing  ER  promoter 
binding,  and  exhibiting  the  strongest  oestrogen  induction  in 
MCF7  and  T47D  cells  by  both  RNA-seq  and  quantitative  PCR 
(qPCR)  validation  (Fig.  lc  and  Supplementary  Fig.  2a).  We  thus 
selected  DSCAM-AS1  for  further  investigation. 

Characterization  of  DSCAM-AS1.  DSCAM-AS1  has  been 
previously  reported  to  be  involved  in  the  proliferation  of  a 
luminal  breast  cancer  cell  line20.  It  exhibits  a  highly  cancer- 
specific  expression  pattern,  mostly  in  breast  cancer  and  lung 
adenocarcinoma,  in  transcriptome  sequencing  data  from  a  cohort 
of  6,503  cancer  and  normal  tissues  and  cell  lines  from  the  TCGA 
and  the  Michigan  Center  for  Translation  Pathology7  (Fig.  2a). 
Supporting  its  association  with  ER  biology,  DSCAM-AS1 
expression  is  highly  enriched  (Student’s  t-test,  P  value  <10E-5) 
in  ER-positive  tumours  among  the  breast  cancer  samples  in  this 
RNA-seq  cohort  with  ER  status  determined  by  IHC  (Fig.  2b  and 
Supplementary  Data  1).  In  addition,  analysis  of  RNA-seq 
performed  on  50  breast  cancer  cell  lines24  revealed  that 
expression  of  DSCAM-AS1  is  highly  specific  to  ER-positive  cell 
lines  (Fig.  2c  and  Supplementary  Fig.  2b).  Further  supporting 
the  association  of  ER  with  DSCAM-AS1 ,  ER  chromatin 
immunoprecipitation- sequencing  (ChIP-seq)  in  both  MCF7  and 
T47D  identified  ER  binding  to  the  DSCAM-AS1  promoter 
following  oestrogen  stimulation  (Fig.  2d),  and  this  finding 
was  confirmed  by  ChIP-qPCR  of  the  DSC  AM- AS  1  promoter 
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Figure  1  |  Identification  of  ER  and  breast  cancer-associated  IncRNAs.  (a)  Heatmap  depiction  of  the  top  cancer  versus  normal  differentially  expressed 
IncRNAs  among  the  TCGA  breast  RNA-seq  cohort  ( n  =  946).  437  IncRNAs  were  differentially  expressed  with  an  SSEA  FDR<1e-5  and  an  SSEA  percentile 
cutoff  of  0.975  (Methods  section).  Expression  values  are  depicted  as  log2  of  the  fold  change  over  the  median  of  the  normal  samples  (n  =  104). 
Unsupervised  hierarchical  clustering  was  done  on  both  IncRNAs  and  patients.  Cancer  progression,  PAM50  classification,  and  ER,  PR,  and  HER2  status  are 
shown  above  heatmap.  LncRNAs  clustered  into  3  distinct  categories,  'Luminal',  'Upregulated',  and  'Downregulated'.  Two  representative  IncRNAs  are 
highlighted,  (b)  Heatmap  depiction  of  the  top  ER-positive  versus  ER-negative  IncRNAs.  449  IncRNAs  met  the  SSEA  criteria  described  in  a.  Unsupervised 
clustering  was  performed  for  samples  and  IncRNAs.  Expression  values  depicted  as  log2  of  the  fold  change  over  the  median  of  the  ER-negative  samples 
(n  =  538).  Cancer  progression,  PAM50  classification,  and  ER,  PR  and  HER2  status  are  shown  above  heatmap.  One  representative  IncRNA  is  highlighted 
along  with  a  group  of  IncRNAs  with  basal-specific  expression,  (c)  Venn  diagram  of  the  intersection  of  the  breast  cancer  versus  normal  and  ER-positive 
versus  ER-negative  analyses.  Intersection  is  shown  for  the  overexpressed  IncRNAs  in  both  categories.  The  top  10  IncRNAs  based  on  expression  level  in 
breast  cancer  tissues  (expression  value  of  95th  percentile  sample)  are  depicted  in  table.  ER  promoter  binding  determined  via  ChIP-seq  is  depicted  (in  either 
MCF7,  T47D  cell  lines,  or  both)  along  with  expression  response  from  RNA-seq  following  3  h  of  oestrogen  stimulation  in  MCF7  cells  (one  arrow  represents 
>1.5  fold  increase,  three  arrows  represents  >2.5  fold  increase). 


(Supplementary  Fig.  2c).  The  isoforms  of  DSCAM-AS1  in 
MCF7  cells  were  identified  using  3'  and  5'  RACE  (Fig.  2d  and 
Supplementary  Table  1).  DSCAM-AS1  expression  is  induced  in 
both  MCF7  and  T47D  cells  after  oestrogen  stimulation,  and  this 
induction  is  reversed  with  addition  of  tamoxifen,  corroborating 
that  ER  is  in  fact  regulating  the  expression  of  this  IncRNA 
(Fig.  2e).  Expression  of  known  ER-regulated  protein-coding 
genes  GREB1  and  PGR  follow  the  same  pattern  of  response  to 
oestrogen,  while  the  IncRNA  MALAT1 ,  serving  as  a  negative 
control,  is  not  induced  by  oestrogen  (Fig.  2e).  In  addition  to 
being  oestrogen-responsive,  DSCAM-AS1  expression  is  present 
in  both  the  cytoplasm  and  nucleus  at  nearly  identical  fractions 
in  both  MCF7  and  T47D  cells  (Supplementary  Fig.  2d),  and  the 
identity  of  DSCAM-AS1  as  a  noncoding  gene  was  corroborated 


using  the  CP  AT  tool25  (Supplementary  Fig.  2e).  We  used 
single-molecule  fluorescence  in  situ  hybridization  (ISH)  to 
further  dissect  the  subcellular  localization  and  gene  expression 
levels  of  DSCSM-AS1  in  breast  cancer  cells.  To  this  end,  we 
designed  probes  that  targeted  all  potential  isoforms  of  the 
transcript  predicted  by  RACE.  On  staining,  we  found  that  each 
MCF7  cell  expressed  ~800  copies  of  the  DSCAM-AS1 
transcript,  almost  half  as  much  as  the  expression  level  of 
GAPDH  (Supplementary  Fig.  2f,g),  additionally  the  similar 
nuclear  and  cytoplasmic  localization  was  corroborated  by  ISH 
(Supplementary  Fig.  2h).  While  the  abundance  of  DSCAM-AS1 
was  lower  in  T47D  cells  (~260  molecules  per  cell,  Supple¬ 
mentary  Fig.  2i,j),  the  relative  expression  level  (compared  with 
GAPDH)  and  the  subcellular  localization  pattern  were  very 
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Figure  2  |  Characterization  of  DSCAM-AS1.  (a)  Plot  highlighting  the  expression  in  FPKM  of  DSCAM-AS1  in  the  6,503  sample  MiTranscriptome  RNA-seq 
compendium7  categorized  by  the  different  cancer/tissue  types.  Each  point  represents  one  RNA-seq  tissue  sample,  (b)  Expression  of  DSCAM-AS1  is 
significantly  higher  in  ER-positive  breast  cancer  tissue  samples  (n  =  584)  compared  with  ER-negative  samples  (n  =  174).  Expression  was  analysed  in 
samples  for  which  ER  IHC  was  performed.  Each  point  represents  one  RNA-seq  sample.  ***P<  0.0001,  comparing  ER-positive  with  -negative,  (c)  Expression 
of  DSCAM-AS1  by  RNA-seq  in  breast  cancer  cell  lines  categorized  by  ER  status.  DSCAM-AS1  expression  is  significantly  higher  in  ER-positive  cell  lines 
(n  =  21)  versus  ER-negative  cell  lines  (n  =  29).  Each  point  represents  one  cell  line.  ***P< 0.0001,  comparing  ER-positive  to  -negative  via  Student's  t- test, 
(d)  UCSC  genome  browser  depiction  of  DSCAM-AS1  region  on  chromosome  21.  RNA-seq  expression  track  shown  in  red,  and  ER  ChIP-seq  shown  in  blue. 
Refseq  transcripts  shown  in  green.  RACE  verified  transcript  structure  shown  in  black,  (e)  qPCR  expression  of  DSCAM-AS1,  GREB1,  PGR,  and  MALAT1  8  h 
following  addition  of  DMSO  vehicle  (black),  10  nM  estrogen  (red),  and  10  nM  estrogen  and  1  pM  tamoxifen  (blue)  in  MCF7  and  T47D  cell  lines.  Error  bars 
represent  s.e.m.  for  three  biological  replicates.  **P< 0.001,  ***P< 0.0001,  NS:  P>0.01  comparing  with  vehicle  for  each  condition  via  Student's  t-test. 
NS,  not  significant. 


similar  to  those  observed  in  MCF7  cells  (Supplementary 
Fig.  2k). 


DSCAM-AS1  is  implicated  in  cancer  aggression.  We  next 
investigated  the  clinical  relevance  of  DSCAM- AS1.  Given  that 
DSCAM-AS1  is  a  IncRNA,  its  expression  is  not  measured  by 
most  traditionally  used  microarrays,  which  are  the  primary 
high-throughput  platforms  annotated  with  reliable  clinical  out¬ 
comes  in  breast  cancer26.  As  a  surrogate,  we  employed  a  guilt-by- 
association  analysis  to  interrogate  the  clinical  relevance  of  those 
genes  most  correlated  to  DSCAM- AS  1.  Given  that  DSCAM- AS  1  is 
an  ER-regulated  IncRNA,  correlation  was  performed  using  only 
ER-positive  breast  cancers,  to  ascertain  clinical  relevance  in  the 
breast  cancer  samples  in  which  DSCAM-AS1  would  be  enriched 
and  most  relevant.  We  obtained  a  number  of  breast  cancer 
clinical  data  sets  from  Oncomine27  containing  gene  expression 
sets  associated  with  the  presence  of  cancer  (versus  normal  tissue), 
high  clinical  grade,  recurrence,  survival  and  metastasis22,23,26-40 
(Methods  section).  We  assessed  for  the  overlap  between  these 
gene  sets  with  the  genes  most  positively  or  negatively  correlated 
to  DSCAM- AS1.  DSCAM- AS  1  positively  correlated  genes  were 
significantly  associated  with  clinical  signatures  associated  with 
increased  cancer  aggression,  tamoxifen  resistance,  higher  grade, 


stage  and  metastasis  (Fig.  3a, b,  Supplementary  Data  3  and  4). 
Similarly,  the  DSCAM- AS1  negatively  correlated  genes  associated 
with  clinical  signatures  that  portended  a  more  favourable  clinical 
outcome  (Supplementary  Fig.  3a, b,  Supplementary  Data  3  and  4). 
For  many  of  the  clinical  concepts,  DSCAM-AS1  positively 
correlated  genes  displayed  a  clinical  association  comparable  to 
those  genes  most  correlated  to  EZH2 ,  a  gene  known  to  be  a 
marker  of  clinical  aggressiveness  in  breast  cancer41,  while  genes 
correlated  to  other  IncRNAs  expressed  in  breast  tissue, 
such  as  HOTAIR,  MALAT1  and  NEAT1 ,  showed  modest- to-no 
association  (Fig.  3b,  Supplementary  Fig.  3b,  Supplementary 
Data  3  and  4).  In  addition,  performing  a  Gene  Set  Enrichment 
Analysis  (GSEA)42  on  all  genes  correlated  to  DSCAM-AS1  yielded 
significant  association  with  a  myriad  of  breast  cancer,  cancer 
aggressiveness,  and  ER-  and  tamoxifen -associated  gene  signatures 
(Supplementary  Fig.  3c).  While  ER-positive  breast  cancers 
typically  result  in  better  clinical  outcomes23,  among  the  luminal 
breast  cancers,  DSCAM-AS1  is  expressed  significantly  higher  in 
luminal  B,  a  clinical  subtype  containing  most  of  the  clinically 
aggressive  ER-positive  breast  cancers22,23  (Fig.  3c).  Despite  these 
associations  of  clinical  aggression  with  DSCAM- AS  1 ,  in  a  survival 
analysis  of  the  ER-positive  TCGA  breast  samples,  expression  of 
DSCAM-AS1  was  not  significantly  associated  with  clinical 
outcome  (Supplementary  Fig.  3d).  Definitive  assessment  of 
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Figure  3  |  DSCAM-AS1  is  implicated  in  cancer  aggression  clinically  and  in  cell  lines,  (a)  Cytoscape  depiction  of  the  overlap  between  the  150  genes 
most  positively  correlated  with  DSCAM-AS1  and  clinical  signatures  from  Oncomine27  for  breast  cancer  clinical  outcomes  (i.e.,  recurrence,  survival  and 
metastasis),  high  cancer  grade,  and  cancer  versus  normal.  All  significant  associations  with  an  odds  ratio  >6  are  shown  (Fisher's  P  value  <  1 E  4).  Size  of 
node  reflects  the  size  of  the  gene  signature,  and  the  thickness/redness  of  the  line  represents  the  magnitude  of  the  odds  ratio,  (b)  Heatmap  displaying  the 
overlap  between  the  top  150  genes  correlated  to  DSCAM-AS1,  EZH2,  HOTAIR ,  MALAT1  and  NEAT 1  and  the  genes  positively  associated  with  various  breast 
cancer  clinical  signatures  (see  above).  For  each  gene,  the  top  row  depicts  the  odds  ratio  for  the  positively  correlated  genes  (red),  and  the  bottom 
row  represents  the  odds  ratio  for  the  negatively  correlated  genes  (blue).  The  first  name  of  the  author  for  each  clinical  study  is  listed,  (c)  Expression  of 
D5CAM-A51  from  breast  cancer  RNA-seq  by  PAM50  classification  (n  =  946).  Luminal  B  expression  is  significantly  greater  than  Luminal  A  (Student's  t- test, 
P  =  0.006)  (d)  Incucyte  proliferation  assay  performed  following  knockdown  of  D5CAM-AS1  using  two  independent  shRNAs.  Degree  of  knockdown 
determined  by  qPCR  shown  above.  Error  bars  represent  the  s.e.m.  for  three  biological  replicates.  *P<0.01,  **P<0.001,  ***P<0.0001,  NS:  P>0.01 
comparing  to  shControl  for  each  condition  via  Student's  t- test,  (e)  Invasion  assay  following  shRNA  knockdown  of  D5CAM-A51  using  Matrigel  coated 
Boyden  chamber  assay.  Error  bars  represent  the  s.e.m.  for  three  biological  replicates.  *P<0.01,  **P<  0.001,  ***P< 0.0001  comparing  to  shControl  for  each 
condition  via  Student's  t- test,  (f)  Soft  agar  colony  formation  assay  following  shRNA  knockdown  of  D5CAM-AS1.  Error  bars  represent  the  s.e.m.  for 
three  biological  replicates.  ***P< 0.0001  comparing  to  shControl  for  each  condition  via  Student's  t- test,  (g)  Invasion  assay  following  overexpression  of 
DSCAM-AS1  and  LacZ  control.  Error  bars  represent  the  s.e.m.  for  three  biological  replicates.  Representative  invasion  images  shown  above.  ***P<  0.0001, 
comparing  to  vector  overexpression  via  Student's  t- test,  (h)  Mouse  xenograft  study  of  tumour  growth  for  T47D  cells  with  shRNA  knockdown  of 
DSCAM-AS1.  Xenografts  with  shRNA  knockdown  of  DSCAM-AS1  (n  =  24)  exhibited  reduced  growth  when  compared  to  control  shRNA  knockdown  (n  =  26). 
Error  bars  represent  the  s.e.m.  for  all  xenografts  used.  ***P<  0.0001,  comparing  to  shControl  via  Student's  t- test,  (i)  Assessment  of  xenograft  metastasis  to 
liver  by  human  Alu  PCR,  which  detects  the  human  cancer  cells  in  the  mouse  liver.  Error  bars  represent  the  s.e.m.  for  three  biological  replicates.  **P< 0.001 
comparing  with  shControl  via  Student's  t- test.  NS,  not  significant. 


NATURE  COMMUNICATIONS  |  7:12791 1  DOI:  10.1038/ncomms12791 1  www.nature.com/naturecommunications 


5 


ARTICLE 


NATURE  COMMUNICATIONS  |  DPI:  10.1038/ncomms12791 


survival  in  this  cohort,  however,  will  likely  require  more  robust 
and  longer-term  clinical  curation  of  the  TCGA  breast  samples. 

We  then  studied  the  role  of  DSCAM-AS1  on  oncogenic 
phenotypes  in  ER-positive  breast  cancer  cell  lines.  In  MCF7  and 
T47D  cells,  stable  knockdown  of  DSCAM-AS1  was  achieved  using 
shRNA  approaches.  DSCAM-AS1  knockdown  reduced  the 
proliferative  ability  of  both  cell  lines  (Fig.  3d),  diminished  the 
ability  of  these  cells  to  invade  in  a  Boyden  chamber  invasion  assay 
(Fig.  3e),  and  substantially  abolished  the  ability  of  these  cells  to 
form  colonies  in  soft  agar  (Fig.  3f).  While  ER  regulates  levels  of 
DSCAM-AS1 ,  ER  expression  and  protein  levels  are  not  dependent 
on  level  of  DSCAM-AS1  (Supplementary  Fig.  4a),  ruling  out  the 
possibility  that  the  phenotype  observed  could  be  explained 
through  changes  in  the  level  of  ER.  In  addition,  knockdown  of 
DSCAM-AS1  exhibited  no  affect  on  RNA  or  protein  levels  of  the 
DSCAM  gene,  in  which  DSCAM-AS1  resides  antisense  and 
intronic  (Supplementary  Fig.  4b).  To  further  demonstrate  the 
impact  of  DSCAM-AS1  on  aggressive  cancer  phenotypes,  we 
overexpressed  DSCAM- AS  1  in  T47D  (Supplementary  Fig.  4c)  and 
ZR75-1  (Supplementary  Fig.  4d),  two  ER-positive  breast  cancer 
cell  lines  with  moderate  DSCAM- AS  1  expression  (Fig.  2c),  and 
observed  an  increase  in  the  invasion  phenotype  (Fig.  3g  and 
Supplementary  Fig.  4e).  MCF7  cells  were  not  included  in  the 
overexpression  studies  as  DSCAM-AS1  is  already  expressed  at  a 
very  high  level  in  these  cells  (Fig.  2c).  Overexpression  was  also 
tested  in  MDA-MB-231  cells  (Supplementary  Fig.  4f),  a  common 
ER-negative  cell  line.  However,  exogenous  DSCAM-AS1  was 
unable  to  confer  oncogenicity  via  proliferation  (Supplementary 
Fig.  4g)  and  invasion  (Supplementary  Fig.  4h).  This  phenomenon 
may  be  explained  by  a  requisite  genetic  and  epigenetic  milieu 
provided  by  ER-positive  cells  in  order  for  DSCAM-AS1  to  confer 
its  cancer  phenotype,  and  more  investigation  into  the  precise 
mechanisms  through  which  it  acts  will  shed  light  on  this  finding. 
Furthermore,  the  simple  presence  of  DSCAM-AS1  alone  is  not 
sufficient  to  make  cells  highly  aggressive,  as  evidenced  by  its  high 
expression  in  ER-positive  cell  lines  that  are  moderately  invasive 
(for  example,  MCF7).  To  further  characterize  the  impact  of 
DSCAM-AS1  on  cancer  phenotype,  we  performed  a  mouse 
xenograft  tumour  growth  assay,  showing  that  loss  of  DSCAM-AS1 
reduces  the  growth  of  implanted  T47D  cells  in  vivo  (Fig.  3h).  The 
metastatic  potential  of  these  implanted  cells  were  also  reduced 
with  DSCAM-AS1  knockdown,  as  evidenced  through  decreased 
liver  metastasis  following  xenograft  (Fig.  3i). 


Role  of  hnRNPL  in  DSCAM-AS1  mechanism.  FncRNAs  have 
been  shown  to  be  functional  through  their  binding  interactions 
with  other  RNAs,  DNA,  and  with  proteins2.  Thus,  identifying 
protein  binding  partners  for  DSCAM- AS  1  is  a  crucial  step 
in  determining  the  mechanism  through  which  it  confers 
oncogenicity.  To  identify  DSCAM- AS  1  binding  partners,  we 
performed  pull-down  of  DSCAM-AS1  and  performed  mass 
spectrometry  on  the  pull-down  product  to  identify  proteins 
bound  to  DSC  AM- AS  1  (Fig.  4a).  The  protein  hnRNPF  was 
observed  to  have  the  highest  spectral  counts  for  the  sense  form  of 
DSCAM-AS1  with  zero  spectral  counts  in  the  antisense  pull-down 
(Fig.  4b).  In  addition,  PCBP2,  a  protein  known  to  complex 
with  hnRNPF43,  was  also  among  the  top  proteins  bound  to 
DSCAM-AS1.  We  thus  investigated  the  interaction  between 
DSCAM-AS1  and  hnRNPF  further.  HnRNPF  is  a  protein 
widely  expressed  in  many  tissue  types  (Supplementary  Fig.  5a) 
and  has  been  implicated  in  regulating  RNA  stability  and 
processing  with  subsequent  effects  on  gene  expression44-47. 
The  binding  of  hnRNPF  to  DSCAM-AS1  was  confirmed  by 
RNA  pull-down  followed  by  western  blot,  with  no  binding  of 
hnRNPF  to  the  negative  control  antisense  transcript  (Fig.  4c). 


Other  RNA-binding  proteins  did  not  bind  DSCAM-  AS  I, 
however,  suggesting  that  DSCAM-AS1  does  not  promiscuously 
bind  to  RNA-binding  proteins  in  general  (Fig.  4c).  To 
further  confirm  this  binding  interaction  and  its  specificity,  RNA 
immunoprecipitation  (RIP)  was  performed  with  using  antibodies 
directed  against  hnRNPF.  DSCAM-AS1  was  highly  enriched  by 
anti-hnRNPF  RIP  in  both  MCF7  and  T47D  cells,  while  control 
coding  and  noncoding  genes  exhibited  modest  binding  (Fig.  4d). 
In  addition,  anti-snRNP70  and  anti-HuR  RIP  failed  to  pull-down 
DSCAM- AS  I,  further  suggesting  the  specificity  of  the 
DSCAM-AS1  -hnRNPF  interaction  (Supplementary  Fig.  5b). 

To  more  specifically  investigate  the  functional  relationship  of 
DSCAM-AS1  and  hnRNPF,  we  performed  rescue  studies 
assessing  the  impact  of  hnRNPL  knockdown  on  the  invasive 
advantage  conferred  by  DSCAM-AS1  overexpression,  observing 
that  reduction  of  hnRNPL  levels  entirely  reversed  the  increase  in 
invasion  observed  on  DSCAM-AS1  overexpression  (Fig.  4e, 
Supplementary  Fig.  6a).  Because  there  was  only  slight,  non¬ 
significant  reduction  in  invasion  with  hnRNPL  knockdown  in 
control  cells,  the  marked  reduction  in  invasion  observed  in  the 
DSCAM-AS1  overexpressing  cells  with  hnRNPL  knockdown  may 
be  the  result  of  hnRNPL  affecting  invasion  in  a  mechanism 
exclusive  to  DSCAM- AS  1.  So,  to  further  characterize  the 
functional  relationship  between  DSCAM-AS1  and  hnRNPF,  we 
set  out  to  localize  the  binding  site  of  hnRNPF  within  the  DSCAM- 
AS1  IncRNA.  Using  in  silico  prediction  drawing  from  prior 
studies  of  hnRNPF  crosslinking- immunoprecipitation  sequencing 
(CFIP-seq)48,  a  single  strong  predicted  binding  peak  was 
identified  near  the  3' -end  of  DSCAM-AS1  (Fig.  4f).  HnRNPF 
has  been  shown  to  bind  CACA-rich  RNA  sites45,  and  the 
predicted  binding  region  possessed  a  10  base  pair  CACA  stretch. 
To  identify  if  this  predicted  region  does  in  fact  account  for  the 
hnRNPF  binding,  multiple  mutant  forms  of  DSCAM-AS1  were 
created  with  or  without  the  binding  site.  DSCAM-AS1-5  and 
DSCAM-AS1-3  are  large  deletion  mutants  containing  only  the 
5'-  and  3' -end,  respectively,  with  only  DSCAM-AS1-3  possessing 
the  predicted  binding  site,  and  DSCAM-AS1-D  is  a  mutant  form 
with  the  27  nucleotides  comprising  the  predicted  binding  site 
deleted  (Fig.  4f,g,  red).  The  various  mutant  forms  of  DSCAM- AS  1 
were  expressed  in  HEK293,  a  cell  line  that  lacks  endogenous 
DSCAM-AS1  expression  while  still  expressing  hnRNPF 
(Supplementary  Fig.  6b).  While  both  the  full-length  and 
DSCAM-AS1-3  mutant  retained  hnRNPF  binding,  loss  of  the 
predicted  binding  region  was  effective  in  abrogating  hnRNPF 
binding  via  both  Western  blot  following  RNA  pull-down  (Fig.  4g) 
and  by  qPCR  following  hnRNPF  RIP  (Supplementary  Fig.  6c).  All 
deletion  mutants  were  expressed  at  comparable  levels,  ruling  out 
the  possibility  of  falsely  diminished  binding  due  to  failed 
expression  of  the  mutant  construct  (Supplementary  Fig.  6d). 
RNA  secondary  structure  is  a  crucial  component  of  RNA 
functionality  and  is  a  key  player  in  RNA-protein  interactions. 
While  the  27  nucleotide  deletion  in  the  DSCAM- AS1-D  mutant  is 
a  small  fraction  of  the  total  number  of  bases  in  the  transcript, 
to  ensure  that  this  deletion  was  not  causing  a  marked  RNA 
secondary  structure  change,  we  investigated  the  impact  of  this 
deletion  on  RNA  secondary  structure  via  the  RNAfold  structure 
prediction  tool49.  Evidenced  by  a  minimal  free  energy  prediction, 
the  posited  secondary  structure  of  DSCAM-AS1  is  largely 
similar  to  that  of  DSCAM- AS1-D  (Supplementary  Fig.  6e), 
suggesting  that  the  loss  of  hnRNPF  binding  observed  with  the 
DSCAM-AS1-D  mutant  is  not  due  to  a  dramatic  secondary 
structure  rearrangement.  Quite  interestingly,  overexpression  of 
the  DSCAM- AS1-D  mutant  in  T47D  cells  failed  to  recapitulate  the 
increased  invasion  observed  when  overexpressing  full-length 
DSCAM- AS  1  (Fig.  4h).  This  finding,  in  combination  with  the 
rescue  studies  following  hnRNPL  knockdown  (Fig.  4e),  strongly 
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Figure  4  |  Physical  and  functional  relationship  of  DSCAM-AS1  with  hnRNPL  (a)  Schematic  representation  of  the  RNA  pull-down  technique  used  to 
identify  protein  binding  partners  of  DSCAM-AS1.  The  BRU  lableled  RNA  transcripts  are  incubated  with  cell  lysate  from  T47D  cells  and  the  eluted  protein  is 
resolved  by  SDS-PAGE.  RNA-bound  protein  product  is  then  processed  by  mass  spectrometry,  (b)  Top  protein  binding  partners  for  D5CAM-AS1.  Pull-down 
of  LocZ  and  antisense  DSCAM-AS1  used  as  control.  S/AS  ratio  determined  as  sense  counts  divided  by  1  +  antisense  counts,  (c)  Western  blot  of  hnRNPL, 
hnRNPK,  snRNP70,  and  HuR  following  pull-down  of  BRU  labelled  DSCAM-AS1  and  antisense  DSCAM-AS1.  (d)  qPCR  following  RIP  for  hnRNPL  performed  in 
MCF  and  T47D  cells.  Data  represented  as  fold-enrichment  over  IgG  RIP.  Error  bars  represent  the  s.e.m.  for  three  biological  replicates,  (e)  Invasion  assay  for 
T47D  cells  overexpressing  LocZ  control  or  DSCAM-AS1  following  siRNA-mediated  knockdown  of  hnRNPL.  Error  bars  represent  the  s.e.m.  for  two  biological 
replicates.  **P<  0.001,  NS:  P>0.01  comparing  to  siControl  (unless  otherwise  specified  with  dotted  line)  for  each  condition  via  Student's  t-test.  (f)  Per  base 
in  silico  prediction  for  binding  of  hnRNPL  to  DSCAM-AS1.  One  strong  predicted  binding  peak  exists  in  the  3'  region  of  DSCAM-AS1  shown  in  red. 

(g)  Schematic  depicting  the  mutant  forms  of  D5CAM-A51  generated  with  or  without  the  predicted  binding  site  (top).  Western  blot  for  hnRNPL  shown 
following  pull-down  of  each  mutant  form  of  D5CAM-A51  in  HEK293  cells  (bottom),  (h)  Invasion  assay  in  T47D  cells  overexpressing  LocZ  control,  full-length 
DSCAM-AS1,  and  the  DSCAM-AS1-D  mutant.  Error  bars  represent  the  s.e.m.  for  three  biological  replicates.  **P< 0.001,  NS:  P>0.01  comparing  to  vector 
overexpression  for  each  condition  via  Student's  f-test.  NS,  not  significant;  siRNA,  small  interfering  RNA. 


suggest  that  DSCAM-AS1  promotes  oncogenicity  via  its 
interaction  with  hnRNPL  in  these  ER-positive  breast  cancer  cells. 

Role  of  DSCAM-AS1  in  tamoxifen  resistance.  A  substantial 
number  of  patients  with  ER-positive  breast  cancer  eventually 
develop  resistance  to  endocrine  therapy  and  present  with  clinical 
recurrence  and  metastasis1 1,50,51 .  Thus,  as  DSCAM-AS1  is 
implicated  in  poor-prognosis  ER-positive  breast  cancer  (Fig.  3a-c 
and  Supplementary  Fig.  3),  we  set  out  to  investigate  its  potential 
role  in  subverting  oestrogen  dependence  and  promoting 
resistance  to  anti- oestrogen  therapies.  We  continuously 
passaged  MCF7  cells  in  1  uM  tamoxifen  for  6  months  until  we 
attained  a  subpopulation  of  MCF7  cells  that  were  able  to  grow  in 


in  tamoxifen  and  termed  these  tamoxifen -resistant  MCF7  cells 
(TamR-MCF7).  Interestingly,  although  expression  of  canonical 
ER  targets  ( GREB1  and  PGR )  was  decreased  compared  to  the 
parental  MCF7  cells,  DSCAM-AS1  expression  was  significantly 
upregulated  despite  already  being  expressed  at  very  high  levels  in 
MCF7  cells  (Fig.  5a).  The  levels  of  ER  were  also  increased,  which 
is  likely  a  compensatory  upregulation  in  response  to  the  continual 
anti- oestrogen  effects  of  tamoxifen.  Additionally,  short-term 
tamoxifen  treatment  of  parental  MCF7  cells  transiently  reduced 
DSCAM-AS1  levels  at  8hrs  following  tamoxifen  treatment,  with  a 
rise  back  to  pre-treatment  levels  after  24  h  (Supplementary 
Fig.  7a).  In  contrast,  canonical  ER  target,  GREB1 ,  exhibited 
pronounced  expression  reduction  at  both  the  short-  and  long¬ 
term  timescale  (Supplementary  Fig.  7b).  To  interrogate  whether 
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Figure  5  |  DSCAM-AS1  is  implicated  in  tamoxifen  resistance,  (a)  qPCR  expression  of  DSCAM-AS1,  ESR1 ,  GREB1  and  PGR  in  tamoxifen-resistant  MCF7  cells 
relative  to  parental  MCF7.  Error  bars  represent  the  s.e.m.  for  three  biological  replicates.  *P<0.01,  ***P< 0.0001,  comparing  to  parental  MCF7  for  each 
condition  via  Student's  f-test.  (b)  Proliferation  assay  in  parental  MCF7  cells  and  in  TamR-MCF7  cells  following  siRNA-mediated  knockdown  of  DSCAM-AS1 
via  two  independent  siRNAs.  Error  bars  represent  the  s.e.m.  for  three  biological  replicates.  **P<  0.001,  ***P<  0.0001,  comparing  to  parental  TamR  siNT  for 
each  condition  via  Student's  t-test.  (c)  WST  viability  assay  following  10  days  of  culture  in  varying  levels  of  tamoxifen  performed  for  T47D  cells 
overexpressing  LacZ  control  and  D5CAM-A51.  Error  bars  represent  the  s.e.m.  for  three  biological  replicates.  ***P< 0.0001,  comparing  to  LacZ 
overexpression  via  Student's  f-test.  (d)  Depiction  of  oestrogen  receptor  binding  to  the  D5CAM-A51  (left)  and  GREB1  (right)  promoters  via  ChIP-seq 
performed  in  primary  and  metastatic  breast  cancer  tumour  tissues.  ER  status  and  response  to  tamoxifen  treatment  detailed  to  left.  ER  binding  peaks 
(determined  using  MACS  software)  are  depicted  in  red  (for  promoter  binding)  and  black  (for  non-promoter  binding).  Promoter  defined  as  5KB  upstream  of 
any  transcriptional  start  site.  ER  promoter  binding  indicated  by  red  check  or  Y  to  the  right.  Genomic  coordinates  in  hg19  listed  above.  siRNA,  small 
interfering  RNA. 


this  upregulation  of  DSCAM-AS1  in  the  TamR-MCF7  cells  is 
functionally  significant,  we  assessed  the  proliferative  capacity  of 
these  cells  following  DSCAM-AS1  knockdown.  With  knockdown 
levels  of  DSCAM-AS1  comparable  to  the  endogenous  levels  in 
parental  MCF7  cells  (Supplementary  Fig.  7c),  knockdown  of 
DSCAM-AS1  in  TamR-MCF7  cells  led  to  a  loss  of  their  baseline 
proliferative  advantage  when  cultured  in  tamoxifen,  exhibiting  a 
proliferation  profile  nearly  identical  to  that  of  the  parental  MCF7 
cells  (Fig.  5b).  Additionally,  knockdown  of  hnRNPL  in  these 
cells  produced  a  similar  loss  of  proliferative  capacity  in  the 
TamR-MCF7  cells  (Supplementary  Fig.  7d,e),  suggesting  that 
both  DSCAM-AS1  and  hnRNPL  may  be  playing  a  role  in 
promotion  of  the  tamoxifen  resistance  developed  by  these  cells. 

We  also  interrogated  the  ability  of  DSCAM-AS1  to  confer 
tamoxifen  resistance  in  native  T47D  cells  via  overexpression  of 
DSCAM-AS1.  DSCAM-AS1  overexpression  was  also  associated 
with  tamoxifen-resistant  growth  in  a  dose-dependent  manner, 
with  a  striking  increase  in  cell  viability  at  levels  of  tamoxifen  as 
low  was  100  nM  (Fig.  5c).  Additionally,  in  line  with  the  ability 
of  DSCAM-AS1  to  provide  oestrogen-independent  growth 
advantage,  cells  over  expressing  DSCAM-AS1  also  exhibited  a 


proliferative  advantage  when  grown  in  oestrogen- deprived 
medium  compared  to  normal  serum  (Supplementary  Fig.  7 f). 
This  growth  advantage  was  abolished  with  the  addition 
of  oestrogen,  and  returned  with  the  subsequent  addition  of 
tamoxifen  (Supplementary  Fig.  7f).  Conversely  corroborating  the 
relationship  of  DSCAM-AS1  on  oestrogen  dependence  in  these 
cells,  we  witnessed  an  increased  oestrogen  dependence  of  T47D 
cells  following  DSCAM-AS1  knockdown  (Supplementary  Fig.  7g). 

To  corroborate  our  in  vitro  findings  in  a  tissue  model,  we 
obtained  data  previously  generated  performing  ChIP-seq  for  ER 
in  primary  and  metastatic  breast  tumour  tissue13.  These  tumours 
were  grouped  into  the  following  categories  as  previously 
described13:  primary  ER-negative  {n  —  2),  primary  ER-positive, 
tamoxifen-responder  ( n  —  8),  primary  ER-positive,  tamoxifen- 
non-responder  ( n  =  7 ),  metastatic  ER-positive  (n  =  3).  Strikingly, 
investigation  of  the  DSCAM-AS1  promoter  revealed  that 
ER  preferentially  binds  to  the  DSCAM-AS1  promoter  in 
tumours  with  clinical  aggression  (ie,  metastatic  and  tamoxifen 
non-responders;  Fig.  5d),  while  a  canonical  ER  target,  GREB1 , 
exhibits  ER-biding  to  its  promoter  in  nearly  all  ER-positive 
tumours,  lacking  preference  for  the  more  clinically  aggressive 
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tumours.  Altogether,  these  data  suggest  that  the  association 
between  DSCAM-AS1  expression  with  clinical  aggressiveness  in 
ER-positive  breast  cancer  samples  may  be  explained,  in  part,  by 
the  ability  of  DSCAM-AS1  to  facilitate  oestrogen-independent 
oncogenicity,  thus  potentially  promoting  resistance  to  endocrine 
therapy  with  tamoxifen. 

Discussion 

Further  investigation  and  study  of  the  mechanisms  through 
which  ER- dominant  breast  cancers  become  aggressive  and 
eventually  evade  traditional  clinical  therapies  is  of  intense  clinical 
interest.  In  this  study,  we  identify  a  myriad  of  potentially 
ER-associated  IncRNAs,  and  functionally  and  mechanistically 
characterize  one  of  the  most  intriguing  candidates.  Nevertheless, 
further  investigation  of  some  of  these  other  IncRNAs  may  also 
contribute  to  our  understanding  of  ER  biology  and  ER- driven 
oncogenesis.  LncRNAs  have  been  shown  to  function  through 
multiple  mechanisms,  and  the  study  of  the  interaction  of 
DSCAM-AS1  with  hnRNPL  is  a  promising  step  towards  under¬ 
standing  the  ways  through  which  this  molecule  executes  its 
oncogenic  function.  While  we  show  that  the  binding  of  hnRNPL 
to  DSC  AM- AS  1  is  responsible  for  at  least  some  of  its 
oncogenicity,  a  further  understanding  of  how  the  interaction 
between  hnRNPL  and  DSCAM-AS1  is  mediating  this  phenotype 
is  necessary. 

Novel  mediators  of  tumour  aggression,  such  as  DSCAM-AS1 , 
can  provide  insight  into  the  mechanism  of  endocrine  therapy 
resistance.  This  increased  understanding  may  in  turn  lead  to 
more  effective  strategies  to  overcome  this  resistance,  which  is  one 
of  the  last,  great  clinical  challenges  in  treatment  of  ER-positive 
breast  cancer.  In  addition,  there  is  little  known  regarding  the  role 
of  noncoding  RNAs  in  developing  resistance  to  anti-oestrogen 
therapy,  with  a  small  number  of  studies  implicating  some  of  the 
more  prominent,  well  characterized  breast  cancer  IncRNAs52,53. 
DSCAM-AS1  is  just  one  of  many  potentially  relevant  ER- 
regulated  IncRNAs  in  breast  cancer,  and  further  investigation  of 
the  other  candidates  is  likely  to  yield  a  greater  understanding  of 
ER-mediated  cancer  biology.  Ultimately,  this  study  provides  key 
insight  into  the  role  of  IncRNAs  in  ER  breast  cancer  biology,  and 
is  an  important  step  in  better  understanding  this  common 
disease. 


Methods 

Cell  lines  and  cell  culture.  All  cell  lines  were  obtained  from  the  American  Type 
Culture  Collection  (Manassas,  VA).  Cell  lines  were  maintained  using  standard 
media  and  conditions.  Specifically,  T47D  cells  were  maintained  in  Roswell  Park 
Memorial  Institute  (RPMI)  1640  medium  supplemented  with  10%  fetal  bovine 
serum,  1%  penicillin- streptomycin  and  5  mg  ml-1  insulin.  ZR75-1  cells  were 
maintained  in  RPMI  1640  medium  supplemented  with  10%  fetal  bovine  serum,  1% 
penicillin- streptomycin.  HEK293  cells  were  maintained  in  DMEM  plus  10%  fetal 
bovine  serum  (FBS)  plus  1%  penicillin-streptomycin.  MCF7  cells  were  cultured  in 
Dulbecco’s  modified  Eagle’s  medium  plus  GlutaMAX  (DMEM,  Invitrogen) 
containing  10%  fetal  bovine  serum  (Hyclone)  and  1%  penicillin- streptomycin. 

To  establish  the  tamoxifen-resistant  cell  line,  MCF7  cells  were  grown  in  IMEM 
phenol-red  free  medium  with  10%  Charcoal- stripped  FBS  in  the  presence  of  1  uM 
(Z)-4-hydroxytamoxifen  (Sigma)  for  6  months.  All  cell  lines  were  grown  at  37  °C  in 
a  5%  C02  cell  culture  incubator,  and  were  genotyped  for  identity  at  the  University 
of  Michigan  Sequencing  Core  and  tested  routinely  for  Mycoplasma  contamination. 


Cell  proliferation  assay.  Cells  were  seeded  in  a  48-well  plate  at  3  x  104  cells  per 
well.  Plates  were  added  to  Incucyte  machine  (Essen  Bioscience)  16-20  h  following 
seeding.  Growth  curves  were  constructed  by  imaging  plates  using  the  Incucyte 
system,  where  the  growth  curves  are  generated  from  confluence  measurements 
acquired  during  continuous  kinetic  imaging.  Four  wells  were  measured  per 
condition.  For  tamoxifen  treatment,  16-20  h  after  seeding,  the  medium  was 
changed  to  RPMI  phenol-red  free  medium  containing  10%  charcoal-treated  FBS  in 
the  presence  of  1  uM  tamoxifen  or  ethanol.  Growth  curves  were  obtained  using 
Incucyte  system  as  described  above. 


Cell  viability  assay.  Cells  were  seeded  in  96-well  plates  at  5,000  cells  per  well  in  a 
total  volume  of  100  pi  media  containing  10%  FBS.  Serially  diluted  tamoxifen  in 
100  pi  of  media  was  added  to  the  cells  12  h  after  seeding.  Medium  containing 
tamoxifen  was  replenished  every  2-3  days.  Following  10  days  of  incubation,  cell 
viability  was  assessed  by  WST  assay  (WST-8,  Dojindo).  All  assays  were  performed 
in  triplicate  and  repeated  at  least  three  times.  The  relative  cell  viability  was 
expressed  as  a  percentage  of  the  control  that  was  treated  with  vehicle  solutions. 

Soft  agar  colony  formation  assay.  10,000  cells  were  suspended  in  medium 
containing  0.3%  agar,  10%  FBS,  and  layered  on  medium  containing  0.6%  agar  and 
10%  FBS  in  six-well  plate.  Colonies  were  stained  for  18-24  h  with  iodonitrote- 
trozolium  chloride  (Sigma  #18377)  following  3  weeks  of  incubation.  Colonies  from 
three  replicate  wells  were  quantified. 

Quantitative  RT-PCR  assay.  The  miRNeasy  mini  kit  was  utilized  to  isolate  RNA 
from  cell  lysates.  From  1  pg  of  isolated  RNA,  Superscript  III  (Invitrogen)  and 
Random  Primers  (Invitrogen)  were  used  to  generate  cDNA  according  to  the 
manufacturer’s  protocol.  The  ABI7900  HT  Fast  Real  time  system  (Applied 
Biosystems)  was  utilized  for  quantitiative  reverse  transcriptase-PCR  (qRT-PCR) 
reactions.  Gene-specific  primer  were  designed  using  the  Primer3  software  and  were 
subsequently  synthesized  by  IDT  Technologies.  A  relative  quantification  method 
was  used  in  analysing  the  qRT-PCR  data  and  data  were  depicted  as  average  fold 
change  versus  the  control  (as  internal  reference,  GAPDH  and  actin  were  utilized). 
All  primers  used  for  qPCR  are  detailed  in  Supplementary  Table  2.  Three  technical 
replicates  were  used  in  each  assay,  and  all  data  shown  was  performed  with  at  least 
three  biological  replicates. 

Oestrogen  and  tamoxifen  treatment.  To  evaluate  the  effect  of  oestrogen 
stimulation,  cells  were  first  hormone  depleted  via  growth  in  phenol-red  free 
medium  containing  10%  charcoal-treated  FBS  for  72  h  and  then  treated  with 
ethanol  vehicle,  10  nM  (3 -estradiol,  or  10  nM  [3 -estradiol  plus  1  uM  tamoxifen.  After 
10  h,  RNA  was  isolated  as  described  above  and  qPCR  was  performed  as  described 
above  using  Power  SYBR  Green  Mastermix  (Applied  Biosystems). 

Subcellular  fractionation.  Cellular  fractionation  was  performed  using  a  RiboTrap 
Kit  (MBF  International),  according  to  manufacturer’s  instructions.  RNA  was 
isolated  and  qRT-PCR  was  performed  as  described  above. 

Knockdown  and  overexpression  studies.  Knockdown  of  DSCAM-AS1  and 
hnRNPL  in  T47D  and  MCF7  cells  was  accomplished  by  small  interfering  RNA 
from  Dharmacon.  Transfections  were  performed  with  OptiMEM  (Invitrogen)  and 
RNAi  Max  (Invitrogen)  per  manufacturer  instruction.  Target  sequences  used  for 
shRNA  or  small  interfering  RNA  knockdown  are  listed  in  Supplementary  Table  3. 
For  stable  knockdown  of  DSCAM-AS1,  MCF7  and  T47D  cells  were  transfected 
with  lentiviral  constructs  containing  2  different  DSCAM-AS1  shRNAs  or  no 
targeting  shRNAs  in  the  presence  of  polybrene  (8  pg  ml _  1  Supplementary 
Table  3).  After  48  h,  transduced  cells  were  grown  in  culture  media  containing 
2  pg  ml _  1  puromycin. 

For  DSCAM-AS1  overexpression,  the  predominant  isoform  (isoform  2, 
Supplementary  Table  1)  was  cloned  into  the  pFenti6.3  vector  (Invitrogen)  using 
PCR8  non-directional  Gateway  cloning  (Invitrogen)  as  an  initial  cloning  vector  and 
shuttling  was  then  done  to  pFenti6.3  using  FR  clonase  II  (Invitrogen)  according  to 
the  manufacturer’s  instructions.  As  control,  LacZ  was  also  cloned  into  the  same 
vector  system.  The  primers  for  making  DSCAM-AS1  mutation  and  truncations  are 
listed  in  Supplementary  Table  2.  Fentiviral  particles  were  made  and  T47D  and 
ZR75.1  cells  were  transduced  as  described  above.  Stable  cell  lines  were  generated  by 
selection  with  3  pgml-  1  blasticidin.  Transient  transfection  of  DSCAM-AS1  and  its 
derivative  mutants  was  done  in  HEK293  cells  was  performed  with  Fipofectamine 
FTX  (Invitrogen)  following  manufacturer’s  instruction.  Cells  were  collected  at  48  h 
post  transfection. 

In  vitro  RNA-binding  assay.  The  RNA-binding  assay  was  performed  according  to 
the  protocol  of  the  RiboTrap  Kit  (MBF  International).  In  brief,  5-bromo-UTP 
(BrU)  was  randomly  incorporated  into  sense  DSCAM-AS1,  antisense  DSCAM-AS1, 
and  LacZ  control  via  PCR-based  transcription.  The  primers  are  shown  in 
Supplementary  Table  2.  The  the  BrU  labelled  RNA  transcripts  were  bound  to  beads 
conjugated  with  anti-BrdU  antibodies.  Then,  the  cytoplasmic  or  nuclear  fractions 
from  MCF7  or  T47D  cells  were  mixed  for  2  h.  Samples  were  washed  four  times 
with  Wash  Buffer  II  before  elution.  The  samples  were  sent  to  the  Michigan  Center 
for  Translational  Pathology  proteomic  core  facility  for  mass  spectrometry. 

Mass  spectrometry.  The  samples  were  treated  with  SDS-PAGE  loading  buffer 
supplied  with  lOmM  DTT  for  5  min  at  85  °C.  The  proteins  were  alkylated  by  the 
addition  of  iodoacetamide  to  the  final  concentration  of  15  mM.  The  samples  were 
subjected  to  SDS-PAGE  and  the  whole  lanes  were  cut  out  and  digested  with  trypsin 
in-gel  for  2  h.  The  resulting  peptides  were  extracted,  dried  and  resuspended  in  0.1% 
formic  acid  with  5%  acetonitrile  before  loading  onto  a  2  cm  EASY- column 
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(Thermo  Scientific)  coupled  to  an  in-house  made  nano  HPLC  column  (20  cm  x 
75  um)  packed  with  LUNA  C18  media  (Phenomenex).  Analysis  was  performed  on 
a  Velos  Pro  mass  spectrometer  (Thermo  Scientific)  operated  in  data-dependent 
mode  using  120-min  gradients  in  EASY-LC  system  (Proxeon)  with  95%  water,  5% 
acetonitrile  (ACN),  0.1%  formic  acid  (FA)  (solvent  A),  and  95%  ACN,  5%  water, 
0.1%  FA  (solvent  B)  at  a  flow  rate  of  220  nl  min  -  1.  The  acquisition  cycle  consisted 
of  a  survey  MS  scan  in  the  normal  mode  followed  by  12  data-dependent  MS/MS 
scans  acquired  in  the  rapid  mode.  Charge  state  was  not  recorded.  Dynamic 
exclusion  was  used  with  the  following  parameters:  exclusion  size  500,  repeat  count 
1,  repeat  duration  10  s,  exclusion  time  45  s.  Target  value  was  set  at  104  for  tandem 
MS  scan.  The  precursor  isolation  window  was  set  at  2  m/z.  The  complete  analysis 
comprised  two  independent  biological  replicates. 

Mass  spectrometry  data  analysis.  The  resulting  spectrum  files  were  transformed 
into  MGF  format  by  MSConvert  software  and  interrogated  by  MASCOT  2.4  search 
engine  using  human  UniProt  database  version  15  concatenated  with  reverse 
sequences  for  estimation  of  false  discovery  rate  (FDR)  and  with  a  list  of  common 
contaminants  (40,729  entries  in  total).  The  search  parameters  were  as  follows:  full 
tryptic  search,  2  allowed  missed  cleavages,  peptide  charges  +  2  and  +  3  only, 

MS  tolerance  1  Da,  MS/MS  tolerance  0.5  Da.  Permanent  post-translational 
modification  was:  cysteine  carbamidomethylation.  Variable  post-translational 
modifications  were:  protein  N-terminal  acetylation,  Met  oxidation  and  N-terminal 
Glutamine  to  pyro-Glutamate  conversion.  The  remaining  analysis  was  performed 
as  previously  described54.  To  summarize,  the  minimal  ion  score  threshold  was 
chosen  such  that  a  peptide  FDR  below  1%  was  achieved.  The  peptide  FDR  was 
calculated  as:  2  x  (decoy_hits)/(target  +  decoy  hits).  The  mass  spectrometry 
proteomics  data  have  been  deposited  to  the  ProteomeXchange  Consortium55 
via  the  PRIDE  partner  repository  with  the  data  set  identifier  PXD002421  and 
10.6019/PXD002421.  Spectral  counts  for  all  detected  proteins  were  assembled  using 
an  in-house  written  Python  script.  The  adjustment  of  spectral  counts  was  done  as 
previously  described54. 

RNA  immunoprecipitation.  RIP  assays  were  performed  using  a  Millipore 
EZ-Magna  RIP  RNA-Binding  Protein  Immunoprecipitation  kit  (Millipore, 
#17-700)  according  to  the  manufacturer’s  instructions.  RIP-PCR  was  performed  as 
qPCR,  as  described  above,  using  total  RNA  as  input  controls.  1:150th  of  RIP  RNA 
product  was  used  per  PCR  reaction.  Antibodies  used  for  RIP  are  listed  in 
Supplementary  Table  4.  All  RIP  assays  were  performed  in  biological  duplicate. 

Invasion  assay.  3  x  105  cells  were  seeded  in  a  24-well  corning  FluoroBlok 
chamber  pre-coated  with  Matrigel  (BD  Biosciences).  Medium  containing  10%  FBS 
in  the  lower  chamber  served  as  chemoattractant.  After  48  h,  cells  remaining  on  the 
lower  side  of  the  membrane  were  stained  with  calcein  AM  (C34852  invitrogen). 
The  invasive  cells  adhering  to  the  bottom  surface  of  the  filter  were  quantified  under 
a  fluorescent  microscope  (  x  2). 

Antibodies  and  immunoblot  analyses.  Western  immunoblot  assays  were 
performed  by  running  cell  lysates  on  4-12%  SDS  polyacrylamide  gels  (Novex)  to 
separate  proteins.  Proteins  were  then  transferred  to  a  nitrocellulose  membrane 
(Novex)  via  wet  transfer  at  30  V  overnight.  Blocking  buffer  incubation  was  then 
performed  for  1  h  (Tris-buffered  saline,  0.1%  Tween  (TBS-T),  5%  nonfat  dry  milk). 
Indicated  antibodies  were  then  added  to  membrane  and  incubated  at  4  °C 
overnight.  Enhanced  chemiluminescence  (ECL  Prime)  was  utilized  to  develop  blots 
via  the  manufacturer’s  protocol.  All  the  antibodies  used  in  this  study  are  described 
in  Supplementary  Table  4.  Representative  full  blot  images  are  shown  in 
Supplementary  Fig.  8. 

Chromatin  immunoprecipitation.  HighCell#  ChIP  kit  (Diagenode)  was  utilized  to 
perform  ChIP  assays  via  the  manufacturer’s  protocol.  Briefly,  MCF7  cells  were 
grown  in  charcoal- stripped  serum  media  (described  above)  for  72  h  and  then 
stimulated  10  nM  estradiol  for  12  h.  Cells  were  then  crosslinked  using  1% 
formaldehyde  for  10  min,  and  crosslinking  was  quenched  for  5  min  at  room 
temperature  using  a  1/10  volume  of  1.25  M  glycine.  Cells  were  then  lysed  and 
sonicated  (Bioruptor,  Diagenode),  yielding  an  average  chromatin  fragment  size  of 
300  bp.  An  equivalent  amount  of  chromatin  equivalent  to  5  x  106  cells  was  used  for 
the  ChIP  for  all  antibodies.  DNA  bound  to  immunoprecipitated  product  was 
isolated  (IPure  Kit,  Diagenode)  via  overnight  incubation  with  antibody  at  4  °C. 
Samples  were  then  washed,  and  crosslinked  reversed. 

ChIP-seq  library  construction  and  sequencing  analysis.  DNA  was  purified 
for  library  preparation  using  the  IPure  Kit  (Diagenode).  The  ChIP-seq  sample 
preparation  for  sequencing  was  performed  according  to  the  manufacturer’s 
instructions  (Illumina).  ChIP-enriched  DNA  samples  (1-10  ng)  were  converted 
into  blunt-ended  fragments  using  T4  DNA  polymerase,  Escherichia  coli  DNA 
polymerase  I  large  fragment  (Klenow  polymerase)  and  T4  polynucleotide  kinase 
(New  England  BioLabs  (NEB)).  A  single  adenine  base  was  added  to  fragment  ends 
by  Klenow  fragment  (3'  to  5'  exo  - ;  NEB)  followed  by  ligation  of  Illumina  adaptors 


(Quick  ligase,  NEB).  The  adaptor-modified  DNA  fragments  were  enriched  by  PCR 
using  the  Illumina  Barcode  primers  and  Phusion  DNA  polymerase  (NEB).  PCR 
products  were  size  selected  using  3%  NuSieve  agarose  gels  (Lonza)  followed  by  gel 
extraction  using  QIAEX  II  reagents  (QIAGEN).  Libraries  were  quantified  with  the 
Bioanalyzer  2100  (Agilent)  and  sequenced  on  the  Illumina  HiSeq  2000  Sequencer 
(100-nucleotide  read  length).  ChIP-seq  data  were  mapped  to  human  genome 
version  hgl9  using  BWA56.  The  MACS  program57  was  used  to  generate  coverage 
map  files  to  visualize  the  raw  signal  on  the  UCSC  genome  browser58.  Hpeak59, 
a  hidden  Markov  model  (HMM) -based  peak-calling  software  program  designed  for 
the  identification  of  protein-interactive  genomic  regions,  was  used  for  ChIP-seq 
peak  determination. 

ChIP-seq  peak  promoter  overlap.  Overlap  of  ChIP-seq  peaks  with  gene 
promoters  was  performed  using  the  BEDTools  ‘coverage’  tool.  Intervals  of  ±  5-10 
kilobases  surrounding  unique  transcriptional  starts  were  used  to  assess  promoter 
overlap. 

Coding  potential  scoring.  Coding  potential  for  all  IncRNA  transcripts  was 
determined  as  described  previously4.  The  alignment-free  Coding  Potential 
Assessment  Tool  (CPAT)25  was  used  to  determine  coding  probability  for  each 
transcript.  CPAT  determines  the  coding  probability  of  transcript  sequences  using  a 
logistic  regression  model  built  from  ORF  size,  Fickett  TESTCODE  statistic,  and 
hexamer  usage  bias. 

Xenograft  analysis.  All  experimental  procedures  were  approved  by  the  University 
of  Michigan  Committee  for  the  Use  and  Care  of  Animals  (UCUCA)  and  conform 
to  all  regulatory  standards.  A  total  of  5  x  106  cells  of  T47D  control  or  T47D  shM41 
cells  were  suspended  in  100  ul  of  PBS/Matrigel  (1:1)  were  injected  subcutaneously 
in  5-week-old  pathogen-free  female  CB-17  severe  combine  immunodefiecient  mice 
(CB-17  SCID)  which  simultaneously  received  a  60-day  slow  release  pellet  con¬ 
taining  0.18  mg  of  17b-estradiol  (Innovative  Research  of  America).  Tumours  were 
measured  weekly  using  a  digital  caliper.  Growth  in  tumour  volume  was  recorded 
using  digital  calipers  and  tumour  volumes  were  estimated  using  the  formula  (n/6) 
(L  x  W2),  where  L  =  length  of  tumour  and  W  =  width.  In  addition,  mouse  livers 
were  collected  to  determine  spontaneous  metastasis  by  measuring  human  Alu 
sequence.  Briefly,  genomic  DNA  from  livers  were  prepared  using  Puregene  DNA 
purification  system  (Qiagen),  followed  by  quantification  of  human  Alu  sequence  by 
human  Alu  specific  Fluorogenic  Taqman  qPCR  probes. 

RNA-seq  data  processing.  Sequence  quality  control  was  done  using  FASTQC 
(http://www.bioinformatics.babraham.ac.uk/projects/fastqc).  Next,  reads  mapping 
to  mitochondrial  DNA,  ribosomal  RNA,  poly- A,  poly-C,  Illumina  sequencing 
adaptors,  and  the  spiked-in  phiX174  viral  genome  were  filtered.  Sequences  were 
downloaded  from  the  Illumina  iGenomes  server  (2012,  March  9).  Mapping  was 
performed  using  bowtie2  (2.0.2).  Reads  were  mapped  using  TopHat2  (2.0.6  and 
2.0.8)  using  default  parameters.  A  human  genome  reference  was  constructed 
from  UCSC  version  Feb  2009  (GRCh37/hgl9)  chromosomes  1-22,  X,  Y  and 
mitochondrial  DNA,  and  references  from  alternate  haplotype  alleles  were  omitted. 
Bowtie-build  and  bowtie2 -build  were  used  to  build  genome  reference  for  Bowtie 
versions  0.12.8  and  2.0.2  were,  respectively.  The  Ensembl  version  69  transcriptome 
was  used  as  a  reference  gene  set.  Using  the  — transcriptome-index  option  in  TopHat 
version  2.0.6  (ref.  60),  alignment  index  files  were  prepared  from  this  reference  for 
Bowtie  versions  0.12.8  and  2.0.2. 

RNA-seq  transcript  expression  estimation.  Cufflinks  version  2.1.1  (ref.  61) 
was  used  with  the  following  parameters  to  estimate  transcript  abundance  from 
RNA-seq  data:  ‘--max-frag-multihits  =  T,  ‘--no-effective-length-correction’, 
‘—max-bundle-length  5000000’,  ‘--max-bundle-frags  20000000’.  To  convert  FPKM 
abundance  estimates  (generated  by  Cufflinks)  to  approximate  fragment  count 
values  we  multiplied  each  FPKM  by  the  transcript  length  (in  kilobases)  and  by  the 
‘Map  Mass’  value  (divided  by  1.0E6)  found  in  the  Cufflinks  log  files. 

Breast  cancer  tissue  expression  heatmap  generation.  The  ‘gplots’  R-package 
was  used  to  generate  heatmaps  using  the  heatmap.2  function.  For  the  cancer  versus 
normal  heatmap,  expression  was  normalized  as  log2  of  the  fold  change  over  the 
median  of  the  normal  samples  for  each  transcript.  For  the  ER-positive  versus 
ER-negative  heatmap,  expression  was  normalized  to  the  median  of  the  ER-negative 
samples.  Unsupervised  heirarchical  clustering  was  performed  with  the  hclust 
function,  using  Pearson  correlation  as  the  clustering  distance,  using  the  ‘ward’ 
agglomeration  method. 

RNA-seq  differential  expression  testing.  Differential  expression  testing  was 
performed  using  the  SSEA  tool  described  previously7.  Briefly,  following  count  data 
normalization,  SSEA  performs  the  weighted  KS-test  procedure  described  in 
GSEA42.  The  resulting  enrichment  score  statistic  describes  the  enrichment  of  the 
sample  set  among  all  samples  being  tested.  To  test  for  significance,  SSEA 
enrichment  tests  are  performed  following  random  shuffling  of  the  sample  labels. 


10 


NATURE  COMMUNICATIONS  |  7:12791  |  DOI:  10.1038/ncomms12791  |  www.nature.com/naturecommunications 


NATURE  COMMUNICATIONS  |  DPI:  10.1038/ncomms12791 


ARTICLE 


These  shuffled  enrichment  tests  are  used  to  derive  a  set  of  null  enrichment  scores 
(1,000  null  enrichment  scores  computed).  The  nominal  P  value  reported  is  the 
relative  rank  of  the  observed  enrichment  score  within  the  null  enrichment  scores. 
Multiple  hypothesis  testing  is  performed  by  comparing  the  enrichment  score  of  the 
test  to  the  null  normalized  enrichment  score  distributions  for  all  transcripts  in  a 
sample  set.  This  null  normalized  enrichment  score  distribution  is  used  to  compute 
FDR  Q  values  in  the  same  manner  used  by  GSEA42. 

Associations  with  oncomine  clinical  signatures.  We  identified  the  top  150 

positively  and  negatively  correlated  genes  (Spearman’s  correlation)  to  DSCAM-AS1 
among  the  ER-positive  breast  cancer  samples.  These  gene  lists  were  imported  into 
Oncomine27  as  custom  concepts.  We  then  nominated  significantly  associated 
breast  cancer  concepts  with  odds  ratio  >  4.0  for  negatively  associated  concepts  and 
>  6.0  for  positively  associated  concepts  and  P  value  <1  x  10  _  6.  Nodes  and  edges 
of  these  associations  were  exported  and  a  concept  association  network  was 
generated  using  Cytoscape  version  3.2.1.  Node  positions  were  computed  using  the 
Force-Directed  Layout  algorithm  in  Cytoscape  using  the  odds  ratio  as  the  edge 
weight.  Node  positions  were  subtly  altered  manually  to  enable  better  visualization 
of  node  labels. 

Association  of  correlation  signatures  with  oncomine  concepts.  Correlation 
analysis  described  above  was  performed  for  DSCAM-AS1,  EZH2,  HOTAIR, 
MALAT1,  and  NEAT1.  For  each  gene,  we  created  a  signature  of  the  top  150  most 
positively  and  top  150  most  negatively  correlated  genes.  We  performed  a  Fisher’s 
exact  test  of  overlap  for  each  of  the  above  gene  signatures  with  Oncomine  clinical 
signatures  for  cancer  versus  normal,  clinical  recurrence,  clinical  survival,  metas¬ 
tasis,  and  high  clinical  grade.  The  following  studies  were  utilized:  Curtis  Breast26, 
Ma  Breas62,  TCGA  Breast28,  Zhao  Breast29,  Bittner  Breast63,  Desmedt  Breast30, 
Ivshina  Breast31,  Loi  Breast32,  Lu  Breast33,  Perou  Breast22,  Schmidt  Breast34, 
Sorlie  Breast23,  vantVeer  Breast64,  Wang  Breast36,  Boersma  Breast37,  Kao  Breast38, 
Symmans  Breast39  and  vandeVijver  Breast40.  For  each  Oncomine  concept,  overlap 
was  tested  for  the  top  1,  5  and  10%  of  genes  up-  and  downregulated,  and  the  gene 
signature  with  the  greatest  odds  ratio  was  selected  for  each  study.  Signature 
comparisons  were  performed  using  a  one-sided  Fisher’s  exact  test. 

Survival  analysis  with  TCGA  breast  data.  Association  of  DSCAM-AS1  levels  on 
clinical  outcomes  was  assessed  using  the  TCGA  breast  cohort.  Survival  data  was 
obtained  from  the  TCGA  data  portal.  ER-positive  samples  were  used  for  survival 
analysis  as  indicated  by  the  TCGA  clinical  metadata  via  IHC  status.  Samples 
with  DSCAM-AS1  expression  >10  FPKM  were  grouped  into  the  ‘DSCAM-AS1 
high’  category  and  samples  with  expression  <  1  FPKM  were  grouped  into  the 
‘DSCAM-AS1  low’  category.  Kaplan-Meier  analysis  was  performed,  and  log- rank 
test  was  performed  to  assess  statistical  significance. 

Tissue  expression  level  percentile  metric.  To  generate  a  metric  to  summarize 
the  expression  of  each  IncRNA  in  breast  cancer  tissues,  we  identified  the  expression 
level  of  the  95th  percentile  sample  among  all  breast  RNA-seq  samples  including 
cancers  tissue,  normal  tissue,  and  cell  lines. 

RNA-sequencing  library  preparation.  Total  RNA  was  obtained  from  cancer  cell 
lines,  and  RNA  quality  was  determined  using  an  Agilent  Bioanalyzer.  Poly-A 
transcriptome  libraries  from  the  mRNA  fractions  were  generated  following  the 
Illumina  RNA-seq  protocol.  Each  sample  was  sequenced  in  a  single  lane  with  the 
Illumina  HiSeq  2000  (100-nucleotide  read  length)  as  previously  described3,65. 

The  dUTP  method  of  second-strand  marking  was  used  for  strand-specific  library 
preparation  as  described  previously66. 

Gene  set  enrichment  analysis.  Expression  levels  of  DSCAM-AS1  were  correlated 
(Spearman)  to  the  expression  of  all  protein-coding  genes  across  all  ER-positive 
breast  cancers.  The  protein-coding  genes  were  then  ranked  by  the  Spearman  Rho 
value,  and  used  in  a  weighted,  preranked  GSEA  analysis  against  MSigDB  gene  sets 
V5.0  (ref.  67). 

In  silico  binding  prediction.  To  obtain  potential  HNRNPL  binding  sites 
on  DSCAM-AS1,  we  utilized  GraphProt68  to  learn  a  predictive  model  from 
genome-wide  HNRNPL  binding  sites  identified  by  iCLIP-seq48.  For  training  data 
generation,  we  extracted  the  genomic  binding  positions  (GSE37560)  with  BED 
table  scores  >  =  10,  followed  by  an  extension  of  ±  20  nt  resulting  in  41  nt  long 
binding  sites.  After  mapping  the  sites  to  annotated  RefSeq  genes  obtained  from 
UCSC,  an  equally-sized  set  of  negative  sites  was  selected  such  that  the  sites  were  on 
the  same  RefSeq  genes  and  did  not  overlap  with  any  of  the  identified  positive  sites 
from  the  initial  table.  The  GraphProt  sequence  model  trained  on  these  data  was 
then  used  to  identify  high-scoring  sites  in  the  DSCAM-AS1  sequence  (NCBI 
GenBank  NR_038899.1).  The  highest-scoring  site  centred  at  RNA  position  923 
contains  a  C A- repeat  motif  known  for  its  affinity  towards  HNRNPL  and  was  thus 
used  for  subsequent  analysis. 


Rapid  amplification  of  cDNA  ends  (RACE).  5'  and  3'  RACE  was  performed 
using  the  GeneRacer  RLM-RACE  kit  (Invitrogen)  according  to  the  manufacturer’s 
protocols.  RACE  PCR  products  obtained  using  Platinum  Taq  high-fidelity 
polymerase  (Invitrogen),  were  resolved  on  a  1.5%  agarose  gel.  Individual  bands 
were  gel  purified  using  a  Gel  Extraction  kit  (Qiagen),  and  cloned  into  PCR4  TOPO 
vector,  and  sequenced  using  M13  primers. 


Single-molecule  fluorescence  in  situ  hybridization.  Single-molecule  fluorescence 
in  situ  hybridization  was  performed  as  described69,  with  some  minor 
modifications.  Cells  were  grown  on  8-well  chambered  coverglasses,  formaldehyde 
fixed  and  permeablized  overnight  at  4  °C  using  70%  ethanol.  Cells  were  rehydrated 
in  a  solution  containing  10%  formamide  and  2  x  SSC  for  5  min  and  then  treated 
with  10  nM  fluorescence  in  situ  hybridization  probes  for  16  h  in  2  x  SSC  containing 
10%  dextran  sulfate,  2mM  vanadyl-ribonucleoside  complex,  0.02%  RNAse-free 
BSA,  1  |igpl_1  E.  coli  transfer  RNA  and  10%  formamide  at  37  °C.  After 
hybridization,  cells  were  washed  twice  for  30  min  at  37  °C  using  a  wash  buffer 
(10%  formamide  in  2  x  SSC).  Cells  were  then  mounted  in  solution  containing 
10  mM  Tris/HCl  pH  7.5,  2  x  SSC,  2mM  trolox,  50  |iM  protocatechiuc  acid  and 
50  nM  protocatechuate  dehydrogenase,  fluorescence  in  situ  hybridization  samples 
were  imaged  in  three  dimensions  using  HILO  illumination  as  described70.  Images 
were  processed  using  custom-written  macros  in  ImageJ.  Analysis  routines 
comprises  3  major  steps:  background  subtraction,  Laplacian  of  Gaussian  (LoG) 
filtering  and  thresholding.  Spots  with  intensity  above  set  threshold  are  represented 
in  images.  Probes  were  designed  to  target  all  isoforms  of  the  DSCAM-AS1 
transcript.  Probe  sequences  targeting  DSCAM-AS1  (21  probes  per  transcript)  are 
as  follows:  S'-cctatccctttctctaagaa-S7,  S'-acttctgcaaaaacgtgctg-S',  5'-ggttccactccatt 
ttaatt-3',  5' -ctatagcgtcttatcagctg- 3' ,  5' - catgtgtccggatatcattt- 3' ,  5' -tcagtgagtggataact 
ggt-3',  S'-aattctagtggaggcaccta-S',  5' - ctaagtagcttcatctttcc- 37 ,  5' - caactgcgtgtttccta 
gtc-3',  S'-agcattctctgttttaacca-S',  S'-ttagcaactgccttgctctg-S',  5'  -gctgtccagttttagta 
aca-3',  5'  - cgttgtgagcctgagagatc- 3' ,  5'-agaacttccctagaggagtg-3',  5'-atggggagtgagaccaa 
aca-3',  5' -tggaggagggacagagaagg,  5'-tgtgggtgattggtactttt-3',  5'-atggatgagtatgtcat 
gcc-3',  5'-tattgccatggttagcatga-3',  5 ' - aatgcatgcttgatggagct- 3 ' . 


Data  availability.  Sequence  data  that  support  the  findings  of  this  study  have  been 
deposited  in  the  Short  Read  Archive  with  the  accession  code  SRP078392.  Tissue 
ChIP-seq  data  referenced  in  this  study  are  available  in  the  Gene  Expression 
Omnibus  with  the  accession  code  GSE32222.  All  remaining  data  are  contained 
within  the  Article  and  Supplementary  Information  files  or  available  from  the 
author  on  request. 
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